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'When  you  can  measure  what  you  are  speaking 
about  and  express  it  in  numbers,  you  know 
something  about  it;  but  when  you  cannot 
measure  it,  when  you  cannot  express  it  in 
numbers,  your  knowledge  is  of  a  meager,  un- 
satisfactory kind.  " 


Sverre  Petterssen  used  this  quotation  from 
Lord  Kelvin  as  a  frontispiece  in  his  Volume 
1  of  Weather  Analysis  and  Forecasting,  pub- 
lished  by  McGraw-Hill  in  1956. 


PROBABILITY  FORECASTING  -  REASONS,  PROCEDURES,  PROBLEMS 


Lawrence  A.  Hughes 
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Central  Region 

Kansas  City,  MO 


ABSTRACT.  This  paper  is  intended  as  a  comprehensive  discussion 
of  probability  forecasting,  based  mainly  on  that  for  precipitation 
probability.  It  is  primarily  intended  to  cover  points  of  concern 
to  those  persons  making  probability  forecasts,  but  it  also  covers 
points  pertinent  to  those  making  management  decisions  concerning 
a  probability  program.  Also  included  is  a  history  of  probability 
forecasting,  and  an  extensive  set  of  references  for  those  wanting 
more  information. 
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1 .   INTRODUCTION 

The  purpose  of  this  Memorandum  is  to  discuss  the  meaning  of,  the 
formulation  of,  and  the  use  of  probability  forecasts  in  meteorology. 
Much  of  the  material  is  based  on  the  precipitation  probability 
effort  of  the  National  Weather  Service  (NWS),  especially  that  of 
the  special  verification  program  of  the  NWS  Central  Region.  However, 
other  types  of  probability  forecasts  will  be  mentioned  where 
appropriate. 

The  purpose  of  a  weather  forecast  is  to  help  people  make  better 
weather-dependent  decisions.  Thus,  we  are  concerned  mainly  with 
the  forecast  user.  As  long  as  the  forecaster  cannot  always  make 
a  forecast  with  complete  certainty,  i.e.,  make  categorical  fore- 
casts that  are  always  correct,  the  user  of  the  forecasts  needs 
to  know  the  forecaster's  degree  of  certainty  (probability)  at  the 
time  of  forecast  issuance.  Probability  words  like  "chance"  and 
"likely"  express  probability  crudely,  even  if  they  are  well  defined 
and  used  consistently. 

Making  accurate  forecasts  is  not  sufficient  for  the  decision  maker 
either.  This  was  noted  by  both  Gringorten  (1958)  and  Borgman  (1960), 

but  Borgman  put  it  best  when  he  concluded:  "Accurate  forecasts 

are  not  sufficient  to  guarantee  a  high  utility unless  the  fore- 
caster makes  the  required  effort  to  word  his  forecasts  so  that  they 
are  usable  in  decision  making."  Malone's  statement  (1956)  augments 
this  by  indicating  that  "In  communication,  cognizance  must  be  taken 
of  indeterminancy  in  a  way  that  will  optimize  the  usefulness  of 
the  forecast  in  making  decisions.  This  leads  directly  to  probability 
forecasts."   Thus,  probability  numbers  are  the  most  precise  means 
of  communicating  what  the  forecaster  has  in  mind  in  that  they  clearly 
qualify  certainty.  REMEMBER— PROBABILITY  IS  ONLY  A  MEANS  OF 
COMMUNICATION.  It  is  thus  a  much  simpler  concept,  especially  for 
the  user,  than  many  people  get  by  connotation  from  the  word 
"probability." 

Because  probability  is  a  precise  means  of  communication,  it  is  easy 
to  notice  errors  in  it,  and  it  is  easy  to  make  errors  in  creating 
it;  therefore,  it  is  important  that  the  forecaster  use  it  correctly, 
or  the  user  can  easily  be  misled.  Murphy  and  Winkler  (1971a,  1971b, 
1974b)  conducted  a  survey  of  NWS  and  other  forecasters  and  noted 
that  forecasters  had  problems  understanding  the  probability  concept. 
These  problems  must  be  resolved  for  the  program  to  reach  its  full 
potential.  They  suggested,  and  Murphy  (1977b)  reiterated,  the  need 
for  an  educational  program  for  both  the  forecasters  and  the  public. 
This  Technical  Memorandum  is  intended  to  serve  that  purpose  for  the 
forecaster,  but  it  also  has  some  suggestion  for  public  education  by 
NWS  people. 


To  be  stressed  are  the  user-oriented  items  of  meaning  and  usage  of 
probability  forecasts,  but  also  treated  will  be  items  of  interest 
mainly  to  the  forecaster,  such  as  how  such  forecasts  are  made,  how 
combined,  how  verified,  and  how  they  can  be  used  for  continuous 
events  like  ceiling  height  or  temperature.  Because  of  its  precision, 
probability  requires  more  effort  by  the  forecaster,  but  less  by  the 
user.  It  is  therefore  user-oriented,  and  not  only  for  the  sophis- 
ticated user,  as  we  shall  see. 

This  memorandum  is  intended  as  a  comprehensive  source  of  information 
on  probability  forecasting,  but,  of  course,  it  doesn't  contain  all 
that  is  known  or  even  all  that  might  be  useful.  However,  it  is 
the  intent  to  make  the  memorandum  more  comprehensive  by  referencing 
many  papers  which  could  provide  additional  information  for  those 
wanting  more  on  specific  points.  To  make  the  memorandum  more  useful 
as  a  reference,  the  Contents  oaqes  list  the  topics  covered,  A  look 
there  should  put  you  close  to  the  right  page  for  information  you 
wish.  Note  that  areal  coverage  problems  are  discussed  in  two 
places. 


2.  HISTORY 

Probability  forecasting  goes  back  at  least  as  far  as  Cooke  (1906a) 
in  Australia  when  he  reported  on  his  experiment  in  the  use  of 
confidence  factors.  Cooke  appended  one  of  five  confidence  factors 
after  each  part  of  the  weather  forecast  (precipitation,  temperature, 
etc.),  with  five  indicating  yery   high  confidence  and  one  a  so-so 
confidence  (50%  probability).  These  five  numbers  were  equivalent 
to  ten  of  ours  because  he  could  use  the  five  for  a  categorical 
"rain"  or  a  categorical  "sunny."  Verifying  about  2000  of  these 
numbers,  Cooke  found  the  following  percent  correct  for  his  categories: 
5-99%,  4-94%,  3-79%,  2-56%,  1-58%.  There  was  controversy  and  mis- 
understanding in  his  day,  too,  because  he  gave  a  short  follow-up 
item  (Cooke  1906b)  in  which  he  attempted  clarification.  There  he 
said  that  he  wasn't  changing  the  structure  of  the  forecast  because 
the  numbers  were  merely  appended  to  the  regular  style  forecast, 
which  is  our  reason  for  the  present  style. 

You  may  be  wondering  whether  confidence  factors  and  probability  are 
the  same  thing.  Basically,  yes.  Confidence  factors  represent 
probabilities,  although  generally  with  some  loss  in  precision,  even 
after  they  are  defined  by  verification  as  Cooke  did.  But  there  is 
a  slight,  but  real,  difference  besides  the  lowered  precision  in 
that  confidence  factors  are  used  for  categorical  forecasts  and 


and  represent  the  forecasters  confidence  in  his  forecast,  while, 
the  probability,  on  the  other  hand,  is  the  forecaster's  confidence 
that  the  event  will  occur.  It  is  not  a  confidence  in  his  forecast, 
because  then  it  would  be  a  probability  of  a  probability. 

During  World  War  I,  according  to  a  U.S.  Army  Signal  Corps  report 
(1919),  the  French  and  British  Meteorological  Services  provided 
forecasts  to  the  American  Expeditionary  Force  which  contained  odds 
in  favor  of  the  forecast.  It  is  interesting,  again,  that  these 
were  appended  to  the  forecast.  Also,  the  report  indicated  that  the 
forecast  was  an  unqualified  categorical  one,  because  they  did  not 
use  qualifiers  such  as  "probable"  or  "possibly." 

Next  came  a  paper  from  the  U.S.  Weather  Bureau  (USWB)  on  forecasting 
precipitation  in  probability  terms  (Hallenbeck  1920),  using  10% 
increments  of  probability.  This  was  done  mainly  to  help  decisions 
involving  irrigation  in  the  southwestern  United  States.  Brier  (1944), 
of  the  USWB  discussed  forecaster  confidence  and  the  value  of  proba- 
bility in  a  paper  that  is  hard  to  find,  perhaps  because  it  was 
classified  at  the  time  and  thus  may  have  had  quite  limited  dis- 
tribution. In  the  last  paragraph  of  this  paper,  Brier  made  the 
following  important  points:  "Every  one  of  us  is  called  upon  to  make 
decisions  of  one  kind  or  another,  decisions  such  as  whether  to  carry 
the  umbrella  to  the  office,  whether  to  set  up  protective  devices 
to  save  the  fruit  from  a  freeze,  or  whether  to  send  bombers  over 
Berlin  on  Friday  or  Saturday  night.  The  decisions  of  a  rational 
man  will,  to  a  large  extent,  depend  upon  his  estimates  of  the 
probabilities  of  the  different  events  and  the  consequences  of  them. 
When  he  is  convinced  that  the  weather  forecaster's  estimates  of 
these  probabilities  are  better  than  his  own,  he  will  come  to  him  for 
weather  information.  But,  in  general,  it  will  be  up  to  this  in- 
dividual (not  the  forecaster)  to  decide  what  course  of  action  to 
take.  He  should  not  be  given  a  'pessimistic'  forecast  or  some  other 
'biased'  forecast.  This  will  sometimes  happen  when  the  forecaster 
has  in  mind  some  particular  operation  for  which  the  forecast  might 
be  used.  However,  so  far  as  the  scientific  problem  of  weather  fore- 
casting is  concerned,  the  forecaster's  duty  ends  with  providing 
accurate  and  unbiased  estimates  of  the  probabilities  of  different 
weather  situations." 

Efforts  toward  making  probability  forecasts  objective  increased 
after  World  War  II.  Price  (1949),  USWB,  developed  an  objective 
technique  for  forecasting  the  probability  of  thunderstorms; 
Dickey  (1949),  USWB,  did  the  same  for  temperature  change;  and 
Berkofsky  (1950)  did  it  for  fog.  Gentry  (1950)  created  a  quasi- 
objective  scheme  for  forecasting  areal  coverage  of  summer  showers 


in  Florida.  Areal  coverage  forecasts  are  not  as  good  as  the  point 
probability  forecasts  the  NWS  uses  now,  but  this  paper  was  a  step 
in  the  right  direction,  and  his  areal  coverages  were  quite  close  to 
point  probabilities  because  showers  occur  almost  eyery   day  in 
Florida  in  summer.  The  relationship  of  areal  coverage  to  point 
probability  will  be  discussed  in  detail  later. 

Efforts  toward  determining  probabilities  objectively  actually  go 
at  least  as  far  back  as  Besson  (1904),  who  related  surface  observed 
variables  to  the  probability  of  precipitation,  although  his  final 
product  was  a  categorical  forecast.  An  interesting  point  here  was 
that  he  noted  that  combining  variables  was  of  little  or  no  use, 
because  of  the  strong  dependence  among  them.  However,  this  effort 
was  with  European  data  and  may  not  be  as  valid  in  the  United  States. 

Williams  (1951)  got  probability  experiments  by  Weather  Bureau 
operational  forecasters  started  again  by  an  experiment  in  the  use 
of  confidence  factors.  He  noted  the  near  equality  of  confidence 
factors  and  probability.  Later,  in  the  1950' s  The  Travelers 
Insurance  Company  started  broadcasting  weather  to  the  public  in 
the  Hartford,  CT,  area,  including  probabilities  for  precipitation. 
These  are  continuing. 

In  1952  Schroeder  (1954),  of  the  USWB  at  Chicago,  started  an 
experiment  using  carefully  defined  probability  words  in  his  fire- 
weather  forecasts.  He  noted  that  forecasters  showed  skill  at  this 
use,  and  he  concluded  that  they  should  continue  to  do  it.  Also 
started  in  1952  and  going  public  in  the  local  forecasts  in  January 
1953  and  continuing  to  date  were  the  probability  efforts  of  the  USWB 
office  at  Hartford,  CT,  as  reported  by  Wassail  (1966). 

Roger  Allen,  when  temporarily  in  charge  of  the  Weather  Bureau's 
St.  Louis  office  for  a  short  time  in  early  1954,  ran  a  probability 
experiment  for  precipitation  and  temperature,  but  these  forecasts 
were  not  released  to  the  public.  At  that  time  the  aviation  briefings 
of  the  office  gave  probabilities  for  ceiling  and  visibility.   These 
efforts  were  not  reported  in  the  meteorological  literature. 

While  the  above  shows  a  long  history  of  work  and  experiments  in 
probability  forecasting,  the  present  era  of  probability  forecasts 
to  the  public  is  probably  best  related  to  experimental  probability 
forecasts  started  in  California  after  a  series  of  papers  on  objective 
methods  for  making  such  forecasts  was  published  (Vernon,  1947; 
Vernon  and  Stoneback,  1952;  Jorgensen,  1949,  1953;  Thompson,  1946, 
1950).  These  forecasts  were  first  issued  to  the  public  in  1956  in 


San  Francisco,  and  probability  forecasts  have  continued  there  to 
date.  They  started  in  Los  Angeles  in  1957.  The  San  Francisco 
efforts  were  reported  on  by  Root  (1958,  1961,  1962). 

In  early  1960,  the  eight  Research  Forecasters  of  the  Weather  Bureau 
(each  at  one  of  the  major  forecast  offices—Boston,  Massachusetts; 
Chicago,  Illinois;  Kansas  City,  Missouri;  Denver,  Colorado;  Salt 
Lake  City,  Utah;  San  Francisco,  California;  Seattle,  Washington;  and 
Anchorage,  Alaska)  in  one  of  their  meetings  noted  the  success  of 
the  California,  Hartford,  and  other  probability  experiments.  They 
endorsed  the  probability  concept  and  recommended  that  each  forecast 
center  continue  or  initiate  such  a  program.  This  was  generally 
done.  The  results  of  some  of  these  efforts  were  reported  by  Dickey 
(1965),  Diemer  (1965),  Hughes  (1965),  and  Stallard,  et  al .  (1965). 
Each  also  included  explanatory  material  on  probability  forecasting 
and  verification,  with  the  most  comprehensive  that  of  Hughes. 

As  a  result  of  these  and  earlier  efforts  and  with  realization  of 
the  potential  of  such  forecasts,  Weather  Bureau  Headquarters  in 
1965,  under  the  authority  of  R.  Simpson  and  E.  Vernon,  with  technical 
guidance  by  C.  Roberts,  authorized  the  start  of  nationwide  fore- 
casting for  precipitation  occurrence.  At  first,  the  effort  was 
a  trial  and  learning  program,  with  the  forecasts  not  released  to 
the  public.  These  trials  were  reported  by  Dickey  (1966),  Diemer 
(1966),  Dunn  (1966),  and  Hughes  (1966a).  The  public  release  began 
in  the  first  half  of  1966.  The  first  public  forecasts  were  re- 
ported on  by  Dickey  (1967),  Hughes  (1967a),  and  Roberts,  et  al. 
(1967).  There  have  been  some  additional  verification  summaries  by 
the  NWS  after  these,  both  regional  and  national,  and  routine 
verification  continues  to  date  for  all  NWS  Central  Region  offices 
and  selected  NWS  offices  on  a  national  basis.  However,  specialized 
verifications—seasonal ,  etc., --then  took  on  interest,  as  discussed 
later. 

Subjective  probability  guidance  from  the  National  Meteorological 
Center  (NMC)  started  on  facsimile  in  October  1965.  The  objective 
probability  forecasts  produced  by  NMC  through  the  efforts  of  the 
Techniques  Development  Laboratory  (TDL)  and  statistical  methods 
were  discussed  early  by  Glahn  (1962)  and  later  by  Glahn  and  Lowry 
(1972).  These  objective  forecasts  began  in  1969  for  the  eastern 
United  States  (NWS,  1969)  and  for  the  conterminous  48  states  about 
January  1,  1972  (NWS,  1971).  These  probabilities  are  generated 
by  combining  numerical  and  statistical  models  through  a  technique 
called  Model  Output  Statistics  (MOS),  This  history  has  had  to  be 
selective,  with  the  intent  to  give  an  overview  of  the  development 
of  probability  forecasting,  especially  from  material  most  likely 
to  be  available  in  NWS  Offices.  Papers  on  or  related  to  probability 


were  considerably  more  numerous  after  1950  and  they  became  more 
sophisticated  as  well  as  more  objective.  Those  desiring  more 
material  should  refer  to  the  bibliography  on  the  subject  (Murphy 
and  Allen  1970).  More  recent  papers,  especially  those  in  the  1970's, 
are  given  by  Murphy  (1976).  Many  additional  papers,  especially  those 
more  recent,  will  be  referenced  later  in  this  memorandum. 

3.   PURPOSE  AND  USE  OF  PROBABILITY  FORECASTS 

H.  Roberts  (1968)  stated  the  problem  of  the  forecaster  well  when 
he  said  of  probability  forecasting,  "Insight  does  not  come  easily: 
the  meaning  of  probability  entails  subtleties  that  invite  mis- 
understanding and  tempt  misapplication  even  by  those  who  are  most 
receptive  to  the  idea  of  expressing  forecasts  in  probabilistic 
rather  than  deterministic  form."  In  this  section  and  the  next,  we 
will  talk  about  a  number  of  the  details  forecasters  must  be  aware 
of  in  making  these  forecasts. 

a.  Words  vs.  Numbers 

Emmons  (1940)  wrote  that  it  was  difficult  to  know  from  the  fore- 
cast what  the  forecaster  had  in  mind.  He  concluded  that,  "What 
we  need,  therefore,  is  a  codification  of  forecast  terms,  in  which 

precise  definitions  are  given then  the  man  in  the  street  can 

judge  for  himself  whether  today's  weather  will  be  good,  bad,  or 
indifferent  from  his  personal  point  of  view."  Landsberg  (1940) 
stated  much  the  same  thing  when  he  said,  "A  weather  forecast  in 
order  to  be  useful  has  to  be  specific  and  should  be  worded  so  that 
the  general  public  understands  what  is  meant  by  the  forecast." 
Quite  a  bit  later,  Dickey  (1956)  said,  "The  important  things  are 
the  percentage  definitions  of  the  terms  and  the  strict  adherence 
to  the  terms  once  they  have  been  decided  upon  and  defined." 

Landsberg  made  another  point  that  may  be  the  reason  some  fore- 
casters and  possibly  some  managers  are  not  fully  in  favor  or 
probabilities  in  forecasts.  In  regard  to  specific  word  usage,  he 
said,  "There  are  two  approaches  to  this  subject.  One  is  from  the 
viewpoint  of  the  forecaster  and  the  verification  of  forecasts, 
the  second  is  from  the  viewpoint  of  the  customer  or  user  of  the 
forecast.  The  forecaster  wants  to  see  his  forecast  verified;  the 
less  specific  his  forecast  or  the  more  latitude  a  term  according 
to  his  own  definition. . .the  better  his  imagined  score.  The  general 
public  wants  information."  He  stated  that  the  public  is  often 
unaware  of  the  wide  leeway  that  the  forecaster  permits  himself  in 
using  certain  terms.  Since  probability  numbers  are  the  most  precise 
expression  of  the  forecaster's  certainty,  it  reduces  the  leeway 
and  therefore  is  of  maximum  potential  benefit  to  the  forecast  user. 


Finally,  according  to  Knox  (1969),  the  indiscriminate  use  of  un- 
defined probability  (word)  modifiers  seriously  reduced  the  effective- 
ness of  the  Canadian  public  forecasts  prior  to  1946.  The  Canadians 
then  shifted  to  purely  categorical  forecasts,  while  the  USWB 
stayed  with  probability  modifiers,  but,  as  noted  by  Hughes  (1965, 
p31 ) ,  they  were  still  not  used  consistently.   In  fact,  almost  the 
whole  spectrum  was  used  when  no  precipitation  was  mentioned.  This 
extreme  lack  of  consistency  of  words  vs.  numbers  is  one  of  the  strong 
arguments  for  the  use  of  numbers. 

In  1966,  the  USWB  established  a  set  of  words  well-defined  in  pro- 
bability terms,  and  these  are  used  rather  consistently  today. 
However,  public  surveys  have  shown  that  forecast  users  do  not 
correctly  understand  varying  certainty  from  words  (Rogell  1972  and 
1976,  and  Eastern  Region  1973,  all  summarized  by  Hughes  1978a). 
In  the  1973  reference  there  was  sampling  in  which  the  public  had 
no  idea  of  the  certainty  from  the  probability  word,  i.e.,  the 
frequency  of  choice  of  the  four  probability  ranges  given  was 
essentially  equal  to  a  random  selection.  On  the  other  hand,  every- 
one knows  that  the  certainty  is  low  when  it  is  10%  and  high  when 
70%,  and  they  know  it  is  going  up  when  it  is  30%  for  today  and  40% 
for  tonight—these  surveys  showed  this.  Probability  numbers  are 
the  most  precise  and,  therefore,  the  most  useful  way  to  express 
varying  certainty. 

The  above  is  not  to  say  that  the  probability  should  carry  all  the 
information,  and  words  none.  For  example,  in  the  precipitation 
forecast,  the  probability  describes  well  the  certainty  as  to  whether 
measurable  precipitation  will  occur  or  not,  but  the  body  of  the 
forecast,  if  it  is  well  done,  should  contain  such  other  information 
as  the  forecaster  can  give,  such  as  the  type,  amount,  and  start  or 
stop  time  of  the  precipitation.  Forecasters  do  not  do  enough  of 
this  sort  of  thing,  tending  to  prove  right  Landsberg's  point 
quoted  above. 

b.  Use  of  Forecasts  and  the  Cost/Loss  Ratio 

Some  forecasters  think  the  public  wants  and  needs  a  yes-no  type 
forecast,  because  the  public's  decision  is  one  of  doing  or  not 
doing  some  particular  thing.  This  is  mostly  true,  but  it  is  not  a 
sufficient  reason  for  making  a  categorical  forecast.  A  categorical 
forecast  is  really  a  two-probability  forecast  in  which  the 
probabilities  are  usually  unknown.  That  is,  taking  rain  forecasts 
as  an  example,  the  two  probabilities  are  the  percent  correct  of  the 
rain  forecasts  and  the  percent  correct  of  the  no-rain  forecasts 
after  subtracting  the  latter  from  100.  These  are  generally  around 


65%  and  15%,  depending  on  the  climatic  frequency  of  rain  and  the 
lead  time  to  the  forecast  period.  These  figures  are  known  only 
from  verification,  but  are  not  given  to  users  and  are  probably  not 
really  known  by  forecasters. 

But  no  forecaster  would  really  be  happy  if  forced  to  issue  only 
those  two  probabilities,  because  much  of  the  time  it  would  be  clear 
that  the  degree  of  certainty  is  different  from  either  of  them.  The 
user  wouldn't  be  happy  either.  Experience  clearly  indicates  that 
the  use  of  11  to  13  different  probability  numbers,  mostly  in  10% 
increments,  is  adequate  for  the  present  state  of  the  science.  How- 
ever, when  the  climatic  frequency  of  the  event  forecast  is  considerably 
below  50%,  there  is  evidence  that  forecasters  can  distinguish  between 
low  probability  numbers  quite  close  together  such  as  0,  2  and  5% 
(see  Hughes,  1965,  p26).  Such  discrimination  could  have  value  in 
drier  climates. 

Using  a  fixed  set  of  probability  numbers  for  all  climates  can  lead 
to  less  useful  forecasts.  There  is  some  rationale  for  saying  that 
there  should  be  about  as  many  probabilities  available  to  the  fore- 
caster that  are  below  the  climatic  frequency  of  precipitation  as 
there  are  above  this  frequency.  This  is  obviously  less  and  less 
the  case  with  any  fixed  set  of  probabilities  as  the  frequency  gets 
low  (practically  all  of  the  frequencies  for  the  6  and  12-hour  periods 
in  the  United  States  are  less  than  40%,  with  cold  season  frequencies 
in  the  Northwest  and  in  the  Great  Lakes  area  the  most  likely 
exceptions).  This  indicates  that  the  use  of  the  really  low  probabilities 
is  more  reasonable  and  necessary  in  dry  climates  (low  frequency 
climates). 

The  user's  decision  to  do  or  not  to  do  something  is  the  basis  for 
a  precise  way  to  use  probability  numbers.  Let  us  look  at  several 
examples.  First,  let's  take  a  forecast  probability  that  the 
minimum  air  temperature  will  fall  below  28°F.  In  an  orchard,  in 
spring,  temperatures  below  this  value  can  cause  damage  to  the  future 
crop  if  the  trees  are  not  protected,  so  the  orchard  manager  has  to 
decide  whether  or  not  to  protect  his  crop  with  heaters.  He  knows 
that  it  will  cost  quite  a  bit  to  run  those  heaters  for  a  number  of 
hours  as  well  as  having  an  expense  for  lighting  them,  putting  them 
out,  refueling  them,  plus  wear  and  tear  by  use.  He  also  knows. that 
loss  of  his  crop  would  be  wery   expensive.   If  he  puts  dollar  values 
on  these  two,  i.e.,  the  cost  to  protect  and  the  loss  if  caught 
unprotected  when  damaging  cold  temperatures  prevailed,  and  takes 
the  ratio  of  protection  cost  over  loss,  he  will  get  a  threshold 
for  decision.  A  reasonable  value  of  the  ratio  for  the  orchardist 
might  be  5%.  Thus  anytime  the  forecast  calls  for  a  probability 
higher  than  that,  he  takes  protective  measures,  otherwise  he  does  not. 
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This  is  overly  simplified,  but  nevertheless  it  is  the  principle  of 
decision  making  using  probability! .  The  5%   is  called  the  cost-loss 
(C/L)  ratio,  and  is  the  threshold  for  this  task.  Note  that  this 
threshold  is  quite  low.  But  you  say  this  is  probably  a  sophisticated 
decision  maker,  what  about  the  general  public?  No  problem.  The 
same  principle  is  used  when  deciding  whether  or  not  to  carry  an 
umbrella  or  to  buy  life  insurance.  In  fact  every  decision  by  each 
of  us  is  made  in  the  same  way. 

Because  of  this  experience  of  the  public  in  making  probabilistic 
decisions  without  really  knowing  it,  it  is  easy  for  the  public  to 
make  such  decisions  from  the  precipitation  probability  forecasts 
of  the  NWS.  For  example,  as  heard  on  a  commuter  bus  after  a  pass- 
enger got  on  and  was  kidded  about  carrying  an  umbrella,  "30%,  that's 
good  enough  for  me."  This  is  exactly  the  way  to  make  this  pro- 
babilistic decision.  The  30%  was  greater  than  his  cost/loss  ratio, 
but  he  would  probably  not  know  what  C/L  was  if  you  asked.  After 
making  such  decisions  a  few  times,  the  decision  maker  may  decide 
to  change  the  threshold.  It  is  by  this  experience  that  the  people 
learn  to  adjust  their  decisions  to  the  best  threshold  probability 
for  them.  The  threshold  could  easily  be  different  depending  on 
whether  the  umbrella  carrier  is  in  work  clothes  or  dress  clothes, 
as  well  as  the  amount  of  tine  out  of  doors.  Thus,  the  threshold 
changes  with  conditions,  and  the  thresholds  for  all  the  decisions 
within  a  metropolitan  area  may  well  cover  almost  all  of  the  pro- 
bility  spectrum  (Root  1965). 

If  you  were  a  private  meteorologist  forecasting  for  a  particular 
client,  you  might  also  act  as  consultant-decision-maker.  If  so, 
you  should  be  aware  of  the  C/L  ratio  for  particular  things.  In 
that  case,  after  you  have  your  probability  in  mind,  you  can  issue 
a  categorical  forecast  telling  the  company  to  protect  or  not  protect. 
In  such  a  case  you  are  the  decision  maker.  Thus,  a  categorical  fore- 
cast is  really  only  useful  when  tuned  to  a  particular  cost-loss 
ratio.  When  there  are  many  ratios,  it  is  a  useless  concept.  For 
the  NWS  forecasts,  since  there  is  a  wide  spectrum  of  C/L  values 
in  the  area,  the  forecaster  issues  the  forecast  in  terms  of  probability 


A  corollary  of  this  is  that  forecasts  have  value  only  when  there  are 
people  to  make  decisions  AND  there  are  alternative  decisions  that 
depend  on  the  forecast—no  people  or  no  way  to  protect  equals  no 
useful  forecast.  The  inverse  would  also  be  generally  true,  that  the 
more  people  making  decisions  and  the  more  alternatives  they  have,  the 
more  important  the  forecasts.  Of  course,  the  monetary  value  of  the 
decisions  has  some  significance  as  well. 


so  each  decision  maker  can  decide  whether  it  is  above  or  below  the 
C/L  value(s)  of  concern.  Only  probability  numbers  will  allow  this 
with  precision  and  without  ambiguity. 

Surveys  and  other  feedback  suggest  that  the  public  understands 
probability  more  and  more  (see  Murphy  1967),  in  spite  of  what  some 
forecasters  think.  For  example,  some  forecasters  say  that  the 
public  doesn't  understand  our  probability  forecasts  because  every 
time  it  is,  say,  40%  or  more  they  take  it  as  though  rain  will  occur. 
But  the  forecasters  are  wrong.  The  public  is  simply  saying  that 
40%  is  above  their  threshold  for  a  number  of  their  decisions- 
carrying  an  umbrella,  washing  a  car,  leaving  a  car  roof  open  while 
working,  possibly  even  going  to  a  ball  game  or  outdoor  concert. 

As  a  forecaster,  suppose  you  are  debating  whether  to  put  out  a  fore- 
cast of  30%  or  40%.  What  is  the  effect  of  your  decision?  There  is 
no  effect  for  users  with  C/L  values  lower  than  30%  and  higher  than 
40%--the  large  majority  of  users.  Only  those  with  C/L  values  between 
these  two  probabilities  would  be  affected.  Since  most  probabilities 
are  low,  the  higher  you  are  in  the  probability  range  the  fewer 
users  that  probably  would  be  affected.  Very  few  and  perhaps  none 
would  be  affected  by  uncertainty  between  70%  and  80%.  However,  there 
is  something  satisfying  to  a  forecaster  in  correctly  forecasting 
100%  probability,  even  though  90%  may  have  brought  about  no  change 
in  the  decisions  made,  and  almost  as  good  a  score  in  our  usual 
measures. 

A  very  sophisticated  usage  of  the  C/L  principle,  where  more  than 
two  options  are  available,  can  be  found  in  Thompson  (1959),  which 
involved  snow  removal  decisions  by  a  municipality.  A  reprint  of  this 
article  was  distributed  to  every  Weather  Bureau  office  at  the  time. 
The  use  of  this  principle  in  the  raisin  industry  was  discussed  in 
depth  by  Kolb  and  Rapp  (1962),  and  a  multiple-option  use  in  fruit- 
frost  decisions  was  discussed  by  Murphy  and  Thompson  (1977). 
Schwerdt  (1970)  discussed  its  use  on  the  New  York  City  docks.  For 
more  information  on  the  C/L  ratio  and  its  use,  see  Thompson  and 
Brier  (1955)  and  Thompson  (1963,  1966). 

c.  Omitting  Probabilities 

Some  people  argue  that  probabilities  shouldn't  be  mentioned  in 
all  forecasts.  The  reason  usually  given  is  that  precipitation 
shouldn't  be  mentioned  all  the  time.  One  wonders  if  such  persons 
realize  that  omitting  probabilities,  say  below  some  threshold,  is 
valid  only  if  there  are  no  C/L  ratios  in  the  omitted  range. 
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No  one  has  examined  the  distribution  of  C/L  values  in  the  United 
States  or  even  in  an  urban-surburban  area.  A  reasonable  distribution, 
considering  that  outdoor  activities  must  relate  somewhat  to  the 
climatic  frequency  of  precipitation  (raisin  drying  outdoors  can 
take  place  in  California  but  not  in  Missouri),  is  that  there  is  a 
maximum  of  C/L  values  fairly  near  the  climatic  frequency  of  pre- 
cipitation, with  very  few  values  close  to  the  extreme  probabilities 
of  zero  and  one.  Thus,  since  most  C/L  values  are  low,  it  is  a 
risky  assumption  that  none  are  below  the  cutoff  value,  especially 
with  a  cutoff  as  high  as  30%.  On  the  other  hand,  omitting  high 
values,  say  >10%,  might  not  matter  at  all  for  decision  makers. 
However,  maximum  utility  for  the  large  group  of  decision  makers 
that  exists  in  almost  any  of  our  forecast  areas  is  more  likely 
found  with  the  full  spectrum  of  probabilities.  Then  all  users  have 
a  complete  opportunity  to  make  the  best  decision. 

An  excellent  example  of  this,  such  as  could  occur  with  many  people 
these  days,  occurred  to  me  one  midnight.  I  was  in  bed  on  a  summer 
night  when  I  realized  that  I  had  left  the  roof  open  on  my  Volkswagen 
bus,  and  that  is  a  big  hole.   I  turned  on  the  NOAA  Weather  Radio 
beside  the  bed,  got  a  forecast  of  zero  probability,  and  went  to 
sleep.  To  get  a  personal  feeling  for  the  C/L  ratio  for  yourself, 
think  about  what  percentage  of  the  time  you  would  accept  rain  in 
the  car.   I  am  sure  it  will  be  small.  I  thought  less  than  10%, 
i.e.,  C/L  =  10%.  With  the  omission  of  low  probabilities,  I  would 
have  had  no  choice  but  to  get  up  and  close  the  roof.  P.S.,  it 
was  a  good  zero  forecast. 

I  personally  think  there  are  a  number  of  these  minor  convenience 
decisions  that  low  probabilities  can  help.   Another  example  would 
be  on  an  overcast  day,  with  clouds  thick  enough  to  prevent  shadows. 
A  zero  probability  on  such  a  day,  and  many  of  these  are  easy  to  do, 
really  helps  our  image  of  good  forecasting  because  to  the  public 
it  looks  like  a  threatening  day.  Many  people  wouldn't  mind 
washing  their  car,  etc.,  even  with  the  overcast  condition,  if  zero 
probability  were  explicitly  stated,  especially  if  they  had  learned 
from  experience  that  such  forecasts  are  correct. 

d.  Use  of  Zero  and  100% 

There  is  little  point  in  using  zero  and  100%  without  modifiers. 
Purists  will  fault  you  that  no  one  can  be  that  certain  and  our 
verification  data  tend  to  bear  that  out.  Also,  forecasters  generally 
wouldn't  bet  their  odds  (see  section  4  i).  Also,  as  mentioned  in 
the  section  above,  there  probably  aren't  any  C/L  ratios  very  near 
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these  extremes.  This  suggests  that  the  terms  "near  zero"  and  "near 
100%"  are  better  to  use. 

e.  Value  of  Probability  Forecasts 

If  one  needed  more  argument  for  probability  forecasting,  a  paper 
by  Thompson  (1962)  should  do  the  trick.  He  showed  that  of  the  total 
possible  gain  from  decision  making  with  perfect  forecasts,  there  is 
about  as  much  to  gain  from  the  creation  and  use  of  probability  fore- 
casts as  from  scientific  advances;  he  also  stated  that  decision- 
making using  probability  is  available  now,  while  the  comparable 
advances  in  knowledge  will  take  a  millenium  to  achieve.  Murphy  (1977a) 
put  it  well  when  he  said,  "...the  value  of  day-to-day  weather  fore- 
casts could  be  significantly  increased,  within  the  context  of  any 
decision-making  situations,  if  such  forecasts  were  routinely  expressed 
in  probabilistic  terms  and  disseminated  to  decision  makers  (including 
the  general  public).  In  this  regard,  it  should  be  emphasized  that 
the  benefits  which  could  be  expected  from  such  a  probability  fore- 
casting program  do  not  depend  on  scientific  advances  in  the  state  of 
the  art  of  weather  forecasting...."  Thus,  while  research,  development, 
and  numerical  prediction  can  lead  to  better  and  more  useful  fore- 
casts, theuseof  probability  is  a  highly  efficient  and  simple  way 
to  help  decision  makers  (sophisticated  and  unsophisticated)  make 
better  decisions  right  now. 


4.   DEFINITIONS  AND  PROBLEM  AREAS 

In  the  routine  local  (zone)  NWS  forecast,  what  is  to  be  forecast  is 
the  average  point  probability  of  a  measurable  amount  of  precipitation 
for  the  local  area  (or  zone)  in  the  time  periods  of  the  forecast. 
This  sounds  straightforward,  but  there  have  been  questions  or  un- 
certainty about  each  part  of  this  definition.  Let  us  look  at  the 
parts. 

a.  Probability  vs.  Chance 

First,  "probability"  is  the  same  as  "chance",  so  that  the  point 
probability  at  a  point  is  the  same  as  the  chance  of  the  event  at  a 
point,  Thus,  a  probability  of  30%  means  that  there  is  a  30%  chance 
that  the  event  forecast  will  occur^.  Point  probability  is  used  be- 
cause almost  all  users  need  the  chance  of  the  event  in  a  very  small 
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One  could  also  say  that  30%  means  that  3  times  out  of  10  that  30% 
is  used,  the  event  is  expected  to  occur.  It  is  not  necessary  that 
the  10  times  be  with  similar  weather  situations,  as  some  have  said. 
Also  3  out  of  10  is  the  same  as  3  to  7  odds.   It  is  easy  to  err  when 
changing  probabilities  into  odds  and  vice  versa,  so  odds  are  not 
recommended  (see  section  4  i). 
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area  such  as  one's  home,  business,  work  site,  or  play  site.  Thus, 
point  probability  is  used  because  it  suits  the  user's  needs  best. 

b.  Average  Point  Probability 

The  average  point  probability  over  the  forecast  area  is  used  be- 
cause the  value  should  apply  to  and  be  usable  at  any  point  within  the 
forecast  area.  If  conditions  over  the  forecast  area  are  uniform  (for 
example,  fair,  rain  everywhere  or  a  uniform  coverage  of  showers),  the 
point  probability  would  be  the  same  at  each  point,  so  the  average  would 
be  the  same  also.  As  long  as  the  conditions  are  uniform  over  the  fore- 
cast area,  the  point  probability  is  not  affected  by  the  size  of  the 
forecast  area. 

Problems  arise  when  the  forecaster  does  not  expect  conditions  to  be 
nearly  uniform  over  the  local  area.  In  areas  which  are  climatological- 
ly  nearly  uniform,  such  as  St.  Louis,  the  forecasters  are  rarely  able 
to  distinguish  probability  differences  across  the  area  (Winkler  and 
Murphy  1976).  At  Rapid  City,  with  its  prominent  terrain  variations  in 
the  local  area,  there  are  probability  variations  over  the  local  area 
the  forecaster  can  forecast  (see  Murphy  and  Winkler  1977a).  When  such 
conditions  can  be  forecast  the  local  area  should  be  split  (see  next 
section). 

An  important  point  relating  to  the  average  point  probability  is  that 
if  one  were  to  forecast  100%  probability,  one  should  be  certain  that 
every  point  in  the  local  area  will  get  precipitation.  Likewise  with 
zero  probability,  no_  point  should  get  precipitation.  These  are  not 
good  forecasts  unless  the  conditions  are  met  (see  also  section  3d), 
even  though,  using  a  single  gage  for  verification,  at  times  the  fore- 
casts could  be  verified  as  if  there  were  no  error  (see  section  7d  on 
use  of  multiple  gages). 

c.  Splitting  Local  Area 

Small  variations  in  probability  in  the  forecast  area  are  neg- 
lected, even  if  they  could  be  forecast.  But  where  terrain  or  other 
features,  or  even  the  movement  of  weather  systems  causes  sizable  -- 
>  20%  --  forecastable  differences  in  probability  over  the  area,  this 
is  best  handled  by  splitting  the  forecast  area  into  two  parts.  For 
example,  in  Miami  "20%  near  the  shore,  60%  inland";  in  Denver  (or 
Rapid  City)  "20%  except  50%  west  portions";  in  Chicago  "20%  except 
50%  near  the  lake."  It  is  highly  desirable,  from  the  user's  view- 
point, for  the  forecaster  to  make  such  space  distinctions.   If  it 
is  not  done  in  a  place  where  sizable  space  variations  are  commonplace, 
forecasts  of  the  average  probability  are  much  less  useful  and  users 
justifiably  complain,  sometimes  in  the  formal  meteorological  litera- 
ture (see  Curtiss  1968). 
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d.   Point  vs.  areal  probability3 

When  you  make  or  use  a  probability,  be  sure  you  understand 
whether  it  is  a  point  or  an  areal  probability.  The  NWS  public 
forecasts  are  and  should  be  for  the  point  probability,  but  some  of 
the  guidance  probabilities  are  areal  probabilities,  e.g.,  severe 
thunderstorms.  If  in  doubt  about  a  guidance  product,  see  the 
appropriate  Technical  Procedures  Bulletin. 

The  relationship  between  point  and  areal  probability  that  is  most 
useful  to  the  forecaster,  given  by  Hughes  (1965),  can  be  stated  as 

Pn  =  pac.  (^ 

p    a  c 

where  T7  is  the  average  point  probability  over  the  forecast  area, 

P  is  the  areal  probability,  i.e.,  the  chance  of  precipitation  for 
a 

any  place  in  the  forecast  area,  and  C  is  the  conditional  areal 

coverage,  i.e.,  the  areal  coverage  the  forecaster  expects  to  exist 
if  precipitation  does  materialize  in  the  area.  Note  that  the  point 
probability  is  almost  always  smaller  than  the  areal  probability, 
that  it  can  never  be  larger,  and  that  they  are  equal  only  when  the 
expected  areal  coverage  is  one--100%.  Forecasters  have  difficulty 
with  this  point  (see  Winkler  and  Murphy  1976).  For  scattered 
showers,  the  areal  probability  is  always  larger.  Also,  the  areal 
probability  usually  increases  as  the  size  of  the  area  increases,  so 
be  sure  to  note  this  size  compared  to  the  size  of  your  forecast  area 
when  using  guidance  given  in  areal  probability  terms.  The  average 
point  probability  over  an  area  much  less  is  sensitive  to  the  size 
of  the  area. 

Forecasters  may  still  be  surprised  at  how  large  the  climatological 
areal  probability  can  be.  Beebe  (1952),  extrapolating  to  a  very 
large  number  of  gages,  has  shown  that  in  summer  measurable  rain 
probably  occurred  within  a  50  mile  radius  of  Atlanta  or  Birmingham 
on  as  much  as  85%  of  the  24-hr  days.  This  result  is  corroborated 
by  Smith  and  Smith  (1978).  Thus,  showers  in  the  local  area  in 
summer  are  so  common  in  these  locations  that  the  main  help  of  the 
forecast  comes  in  the  ability  to  distinguish  differing  areal 
coverages,  yet  D.  Smith  (1977)  has  data  that  shows  forecasters  even 
now  have  little  such  ability,  as  noted  by  Hughes  (1977b)  and  Murphy 
(1978b).  Causey  (1953)  showed  much  the  same  thing  as  Beebe,  but  for 
Lincoln,  NE,  Peoria,  IL,  and  a  point  in  eastern  Ohio,  and  in  spite 
of  using  only  a  35-mile  radius,  frequency  of  about  65%  was  obtained. 


3See  also  Sections  6a  and  7e. 
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Going  back,  Eq.  (1)  was  derived  with  the  forecaster's  need  in  mind, 
because  it  separates  the  problem  into  the  parts  to  be  handled,  in 
that  it  separates  the  event  of  "some  rain"  as  related  to  whether  or 
not  the  weather  system  will  reach  the  forecast  area  from  the  problem 
of  the  spottiness  of  the  rain,  if  it  does  come.  It  is  thus  most 
useful  before  the  fact.  It  is  obvious  from  Eq.  (1)  that  the 
conditional  areal  coverage,  C  usually  does  not  include  all  the 
uncertainty,  and  thus  is  less  desirable  for  decision  making. 

Another  equation  is 

P  =  C  (2) 

p    u 

Here  the  areal   coverage  C     is  unconditional.     This  usage  is  evidently 

more  understandable  to  mathematicians   (see  Curtiss  1968).     However, 

it  is  best  used  in  a  climatological   sense—after  the  fact,  and  it 

says  that  the  best  probability  one  could  forecast  on  any  particular 

period  is  the  areal   coverage  that  actually  occurred  in  that  period. 

This  explains  why  the  spotty  showers  of  summer  do  not  allow  as 

frequent  use  of  the  high  point  probabilities  as  does  the  widespread 

precipitation  events  of  winter.     It  also  shows  that  since  the  larger 

amounts  of  precipitation  must  cover  even  smaller  areas  than  just  a 

measurable  amount,  high  probabilities  would  be  used  even  less  for 

higher  precipitation  amounts.     Eq.   (2)  equals  Eq.'-(l)  when  the 

forecaster  is  certain  that  precipitation  will  occur  in  the  forecast 

area,  i.e.,  Pa  =  1.0.     Since  this  is  not  commonly  the  case,  Eq.   (1) 
a 

is  more  useful   at  forecast  time.     Also,  this  uncertainty  is  why  a 

forecast  of  areal   coverage  is  not  as  useful   for  most  decision  making 

as  a  point  probability.     That  is,  decision  making  requires  knowledge 

of  all   uncertainty. 

e.     Measurable  vs.  Trace4 

Going  back  to  the  definition  at  the  beginning  of  this  section, 
let  us  look  at  another  part  of  it—measurable.     Many  forecasters 
feel   that  a  trace  of  precipitation  should  be  included  in  the 
definition  of  a  precipitation  event,  or  that  a  trace  should  count  as 
a  hit  when  the  probability  is  for  measurable.     The  main  reason 
for  using  the  measurable  criterion  is  that  it  has  been  in  use  for 
many  years  and  the  users  have  become  accustomed  to  it.     The  frequency 
°f  on1y  a  trace  is  generally  almost  as  high  as  the  frequency  of 
more-than-a-trace  (see  Hughes  1965).     Thus  if  one  were  to  include 
traces  as  precipitation  events,  the  average  forecast  probability 
would  have  to  almost  double  to  adjust  to  the  change  in  definition. 
This  would  cause  great  confusion  to  users  for  quite  a  time,  and, 
because  forecasts  are  compared  to  climatology,   verification  scores 
need  not  be  higher  since  the  climatic  frequency  of  a  trace  or  more 
would  be  almost  twice  as  high  as  that  for  measurable.     Of  course, 


'♦See  also  Section  8f. 
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it  would  be  completely  inappropriate  to  verify  using  a  trace  amount 
as  an  event  if  the  forecasts  were  for  a  measurable  amount.  The  event 
forecast  and  that  used  to  verify  must  be  identical. 

Another  point  is  that  it  is  generally  considered  that  a  trace  amount 
requires  no  protective  action  and  thus  should  be  treated  the  same 
as  no-rain.  This  is  probably  not  true  for  everyone,  but  the 
Weather  Service  must  aim  to  help  the  large  majority,  and  let  the 
others  with  less  common  needs  be  additionally  served  by  private 
weather  services.  Of  course,  the  text  of  the  forecast  gives 
additional  information  on  expected  events.  The  probability  alone 
cannot  and  was  never  intended  to  carry  the  whole  precipitation 
message.  This  point  must  be  kept  in  mind  when  constructing  the 
forecast. 

In  the  wintertime,  snow  flurries  behind  cold  fronts  and  to  the  lee 
of  sizable  bodies  of  unfrozen  water  are  wery   frequent,  and  commonly 
insignificant.  Of  course  there  are  heavy  snow  squalls  at  times  in 
such  conditions,  but  these  are  less  common  than  light  flurries.  To 
call  high  probabilities  for  flurry  trace  events  would  cause  loss  of 
value  for  the  probability  on  other,  more  important,  days. 

f.   Point  probability  vs.  precip.  amount 

Some  will  say  that  the  public  needs  the  probability  of  more 
than  a  measurable  amount  of  precipitation,  and  that  is  true.  The 
problem  is  how  to  get  the  information  into  the  forecast  without 
causing  undue  confusion.  It  could  be  done  in  specialty  forecasts, 
e.g.,  agricultural  and  fire  weather,  because  MOS  guidance  already 
exists  for  QPF  (quantitative  precipitation  forecast)  probabilities 
(Bermowitz  and  Zurndorfer,  1979).  However,  there  is  some  QPF 
information  in  the  probabilities  of  only  measurable  precipitation. 
This  was  shown  for  the  three  forecast  periods  for  one  cold  season 
by  Wasserman  and  Rosenblum  (1972)  for  four  East  Coast  locations 
combined  and  by  Wasserman  (1972)  for  13  NWS  Eastern  Region  locations-- 
see  Table  1.  It  was  also  shown  for  the  first  forecast  period  (today) 
for  the  warm  season  by  Hashemi  and  Decker  (1969)  for  the  four 
Weather  Bureau  offices  in  Missouri—see  Fig.  1. 

To  use  Fig.  1,  find  the  point  of  intersection  between  the  curved 
line  appropriate  to  the  forecast  probability  of  a  measurable  - 
amount  and  the  vertical  line  of  the  desired  precipitation  threshold, 
and  note  on  the  left  side  of  the  graph  the  probability  of 
precipitation  greater  than  the  threshold.  For  example,  for  a  70% 
forecast  probability  (60-100%  line),  the  chance  of  >  0.60  in.  is 
about  17%. 

These  data  are  probably  not  universal  and  need  to  be  calculated  for 
other  areas,  especially  those  in  the  West,  and  possibly  should  be 
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>  AMOUNT  INDICATED 


PEATMOS 
POP 


CASES 


TRACE 


0.01" 


0.11" 


0.26* 


0.51' 


13 

O 

>-i-H 

3  U 

O  <D 

IUPh 

I 

CN   +J 

iH    CO 

CO 

•U    O 

CO    0) 

iH    U 

O 


T3 
C 

•H 
J-l 

01 


PC   P-. 

I 


CN 
rH 

C 

CN 


13 
O 


U 

3    U 
O    d) 

X     P-. 
I 

CN  4J 
.H  CO 
CO 
13  CJ 
M  01 
CO     u 

o 


00% 
10% 
20% 
30% 
40% 
50% 
60% 
70% 
80% 
90% 

100% 

ALL 


00% 
10% 
20% 
30% 
40% 
50% 
60% 
70% 
80% 
90% 

100% 

ALL 


00% 
10% 
20% 
30% 
40% 
50% 
60% 
70% 
80% 
90% 
100% 
ALL 


546 
452 
329 
187 
165 
135 
112 

84 
102 
120 

80 
2312 


565 

415 

341 

203 

173 

151 

119 

154 

97 

75 

18 

2311 


365 
497 
399 
250 
229 
176 
216 
143 
35 
1 

g 

2311 


0.10 

0.03 

0.00 

0.00 

0.00 

0.25 

0.09 

0.01 

0.00 

0.00 

0.41 

0.20 

0.04 

0.02 

0.00 

0.54 

0.29 

0.09 

0.02 

0.01 

0.68 

0.50 

0.12 

0.03 

0.01 

0.73 

0.56 

0.18 

0.04 

0.03 

0.81 

0.58 

0.21 

0.06 

0.01 

0.90 

0.73 

0.33 

0.12 

0.01 

0.92 

0.75 

0.41 

0.17 

0.07 

0.97 

0.93 

0.64 

0.39 

0.18 

1.00 

0.94 

0.80 

0.69 

0.33 

0.46 

0.31 

0.14 

0.07 

0.03 

0.12 

0.05 

0.02 

0.00 

0.00 

0.27 

0.13 

0.03 

0.01 

0.01 

0.40 

0.21 

0.05 

0.01 

0.00 

0.62 

0.35 

0.11 

0.03 

0.01 

0.66 

0.43 

0.12 

0.05 

0.01 

0.74 

0.52 

0.22 

0.06 

0.04 

0.87 

0.69 

0.34 

0.18 

0.08 

0.91 

0.74 

0.38 

0.16 

0.06 

0.88 

0.76 

0.44 

0.26 

0.11 

0.93 

0.84 

0.56 

0.47 

0.15 

0.89 

0.89 

0.61 

0.61 

0.44 

0.46 

0.32 

0.13 

0.07 

0.03 

0.10 

0.04 

0.01 

0.01 

0.00 

0.24 

0.12 

0.03 

0.01 

0.00 

0.44 

0.25 

0.08 

0.03 

0.02 

0.56 

0.37 

0.08 

0.02 

0.01 

0.59 

0.42 

0.17 

0.09 

0.03 

0.74 

0.53 

0.24 

0.11 

0.07 

0.82 

0.63 

0.34 

0.19 

0.05 

0.84 

0.73 

0.41 

0.27 

0.11 

0.91 

0.83 

0.54 

0.49 

0.23 

1.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.00 

0.46 


0.31 


0.13 


0.07 


0.03 


Table  1.   Relative  frequency  of  specified  12-hour  precipitation 
amounts  as  a  function  of  PEATMOS  PoP.   Results  are  for 
13  Eastern  Region  stations  combined,  for  the  period 
January  1,  1972  through  March  31,  1972.   (Wasserman,  1972) 
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Figure  1. --Cumulative  likelihoods  for  greater  precipitation  than  indicated 
amounts  for  various  "today"  probability  forecast  classes 
(from  Hashemi  and  Decker,  1969). 
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updated  from  time  to  time. 

g.  Time  Periods 

The  words  "time  periods  of  the  forecast"  are  also  critical  in  the 
definition.  The  probability  must  pertain  to  a  time  period  that  is 
understood  by  the  user.  It  cannot  pertain  to  an  indefinite  period, 
because  the  probability  usually  decreases  as  the  length  of  period 
decreases  (see  Topil,  1963,  for  an  example).  Thus  a  6  h  probability 
is  almost  always  smaller  than  a  12  h  probability.  Therefore,  the 
probability  for  "this  afternoon"  is,  on  the  average,  smaller  than  for 
"today",  generally  about  two-thirds  the  size  (see  Hughes  1978c  and 
Jorgensen,  1967).  When  updating  the  "today"  forecast  in  late 
morning,  this  must  be  taken  into  account  when  selecting  the 
probability  for  "this  afternoon",  and  rarely  should  the  "today" 
probability  be  unchanged  in  the  update  forecast.  Generally  a  "think 
small"  motto  applies,  because,  as  discussed  in  detail  later,  a 
large  overforecasting  bias  otherwise  results. 

Because  the  probability  pertains  only  to  known  periods  of  finite 
size,  it  changes  discontinuously  when  one  period  ends  and  the  next 
begins.  This  means  that  terms  suggesting  continuous  change,  like 
"decreasing"  should  not  be  used.  At  times  such  usage  also  creates 
uncertainty  as  to  the  length  of  the  forecast  period,  doubly 
aggravating  the  problem.  It  is  preferable  and  far  safer  to  use  the 
standard  format  of,  for  example,  80  percent  this  afternoon  and  30 
percent  tonight.  If  the  first  period  probability  is  omitted  because 
it  is  precipitating  or  is  about  to,  the  probability  is  supposed  to  be 
obviously  high  to  the  forecast  user,  so  omitting  it  without  adding 
any  words  to  the  usual  probability  statement  is  best. 

Another  misuse  of  periods  is  to  combine  them  by  saying,  for  example, 
"probability  30  percent  today  and  tonight."  This  could  create 
confusion  in  the  mind  of  the  user  as  to  whether  the  30  percent 
applies  to  each  period  separately  or  to  them  combined.  Since  they 
do  not  know  our  rules,  which  say  they  will  be  separate,  this  must  be 
made  clear  in  the  text  by  either  using  the  standard  format  or  by 
saying  "Probability  30  percent  both  today  and  tonight."  These  small 
but  important  points  make  it  harder  on  the  forecaster,  but  they  are 
essential  to  the  user's  understanding. 

Some  forecasters  object  to  the  fixed  periods  of  "today"  and  "tonight", 
preferring  periods  more  related  to  the  synoptic  situation,  e.g., 
"late  this  afternoon  and  this  evening"  as  one  period.  This  is  poor 
for  two  reasons.  One,  the  limits  of  the  period,  and  thus  its  length, 
are  not  clear,  so  the  argument  above  applies.  Perhaps  more 
importantly,  the  change  suggested  is  to  satisfy  the  forecaster, 
probably  at  the  expense  of  the  user.  This  is  because  the  user  is 
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mostly  interested  in  the  period  "today"  or  "this  afternoon"  if  . 
outside  work  is  being  done,  or  in  the  period  "tonight"  if  outside 
play  is  planned,  and  mostly  those  interested  in  one  will  not  be 
interested  in  the  other,  or  they  will  be  interested  in  each 
separately.  That  is,  the  pre-selected  dividing  line  between  periods 
is  fairly  well  suited  to  the  user's  needs  in  general. 

Because  users  are  frequently  interested  in  periods  shorter  than  12  h 
it  is  often  desirable  to  split  a  12  h  period  into  two  6  h  periods. 
This  is  especially  true  when  the  probabilities  in  these  two  periods 
are  significantly  different.  Unfortunately,  forecasters  do  not  do 
this  enough.  However,  and  mostly  in  the  warmer  half  of  the  year, 
many  people  are  interested  in  first  of  the  night  period  but  are  not 
at  all  interested  after  that,  because  they  have  an  outdoor  evening 
planned  for  a  picnic,  ball  game,  outdoor  theatre,  etc.  So 
forecasters  should  make  the  extra  effort  to  help  the  decision  maker 
in  these  cases  and  subdivide  the  12  h  period.  While  this  is 
probably  more  important  for  the  night  period,  subdivision  of  the 
day  period  will  also  help  decision  makers  more  than  a  12  h  period 
forecast.  Someday  we  may  always  start  a  forecast  with  at  least  one 
6  h  period,  simply  to  provide  better  service. 

When  subdividing  a  12  h  period,  when  you  already  have  a  12  h  pro- 
bability in  mind,  you  could  use  the  tables  given  later  in  Section 
13  "Combining  probabilities"  to  check  that  the  two  6  h  probabilities 
properly  match  the  12  h  probability  you  had  in  mind.  This  is 
simply  working  the  tables  in  reverse. 

h.  First  period  problems 

The  first  period  probability  has  a  couple  of  troublesome 
aspects  not  found  in  other  periods.  Once  the  first  period  is 
underway,  the  probability  forecast  made  before  the  period  started 
is  no  longer  completely  valid.  The  more  of  the  period  that  has 
passed,  the  less  it  is  valid.  This  is  because  probability  is 
related  to  period  length  (see  Section  8b).  We  have  no  solution  to 
this  problem  as  yet. 

Another  problem  with  the  first  period  probability  is  that  once 
precipitation  has  occurred  at  some  point,  the  probability  no  longer 
has  any  meaning  for  that  point.  However,  it  can  still  have  meaning 
for  locations  where  precipitation  has  not  occurred.  In  talks  to 
users,  this  point  could  be  made,  because  some  users  do  not  understand 
it.  It  is  not  a  prominent  point,  nor  one  that  has  an  effect  on 
decision  making  because  once  precipitation  has  occurred,  the  user 
no  longer  has  to  make  a  decision  concerning  whether  or  not  it  will 
precipitate.  This  problem  exists  no  matter  what  type  of  precipitation 
forecast  is  made,  probabilistic  (numbers  or  words)  or  categorical. 
If  it  is  caused  by  a  widespread  rain  area  moving  in  unexpectedly 
or  earlier  than  expected,  the  solution  would  be  to  update  the 


20 


forecast  and,  since  rain  has  started,  omit  the  probability  number. 
If  spotty  showers  are  the  cause,  the  solution  is  more  difficult. 
User  education  is  the  best  solution  for  now.  A  future  possibility 
would  be  to  update  the  forecast  removing  the  probability  but  ex- 
pressing the  uncertainty  in  areal  coverage  terms,  e.g.,  scattered 
showers. 

i.  Probability  vs.  odds 

Probability  and  odds  are  the  same  thing,  but  odds  can  easily 
be  more  confusing.  When  odds  are  stated  as,  for  example,  1  to  4 
for  rain,  it  means  one  chance  for  rain  and  four  against  rain.  The 
probability  this  relates  to  is  given  by  1/(1  +  4)  or  20%.  Odds  of 
3  to  7  would  be  3/(3  +  7)  or  30%.  Probabilities  can  be  converted 
to  odds  easily  when  odds  are  expressed  as  1  to  "something."  To 
find  what  the  "something"  is  for  a  particular  probability,  divide 
the  probability  into  1  and  subtract  1.  For  example,  20%  yields 
(1/.2)  -1=5-1=4,  thus  odds  of  1  to  4.  For  30%,  we  get  (1/.3) 
1  =  2.33,  thus  odds  are  1  to  2.33.  To  express  this  differently,  we 
could  take  3  x  (1  to  2.33)  or  3  to  7. 

Clearly  probability  is  simpler.  In  fact  the  confusion  of  odds  is 
such  that  unethical  betting  called  "Dutch  Book"  can  be  done.  In 
such  a  case  the  unethical  gambler  bets  for  or  against  all  possible 
events,  one  of  which  must  occur,  but  he  gives  odds  such  that  he 
must  win  no  matter  what  occurs.  If  the  odds  he  gives  for  each 
possible  event,  when  converted  to  probability,  do  not  sum  to  one, 
soneone  has  a  Dutch  Book  bet  and  can't  lose  overall  (see  H.  Roberts). 

The  correct  probability  for  a  particular  forecast  is  one  in  which 
the  forecaster,  betting  according  to  the  proper  odds,  would  have  no 
preference  for  one  side  or  the  other  in  the  bet.  Thus  for  a  20% 
forecast,  the  forecaster  would  have  no  preference  either  to  put  up 
$1  for  rain  or  $4  against  rain.  If  there  is  a  preference,  the 
probability  does  not  truly  reflect  the  forecaster's  assessment.  It 
is  likely  that  fewer  extremely  high  or  especially  low  probabilities 
would  be  used  if  the  forecaster  was  forced  to  wager  on  them.  At 
times  I  have  checked  this  out  for  0%  and  100%.  (see  section  3d). 


5.   DETERMINING  PROBABILITIES 

a.  Objective  scheme  and  guidance 

An  early  objection  to  starting  a  probability  program  was  that 
there  were  no  objective  schemes  to  aid  in  creating  probabilities. 
That  argument  has  absolutely  no  validity  because  use  of  probability 
is  not  related  to  how  the  forecast  was  made.  As  long  as  users 
want  specific  information,  and  they  do,  and  the  forecasters  can 
sense  differing  certainties,  and  they  can,  certainty  is  best  ex- 
pressed in  probability  numbers.  It  may  take  practice  and  feedback 
from  a  verification  to  maximize  the  effort,  but  probability  c 
forecasting  can  be  done  well  subjectively.  That  has  now  been 
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shown  many  times.  Murphy  (1978c)  describes  the  relation  of  the 
forecaster's  judgement  to  probability. 

One  advantage  of  an  objective  scheme  is  that  it  will  give  the  same 
forecast  on  a  particular  situation  no  matter  which  forecaster  is  on 
duty.  When  several  forecasters  are  on  duty  at  the  same  time, 
their  subjective  probability  estimates  for  a  situation  need  not  be 
the  same.  Some  feel  that  this  a  reason  to  not  make  probability 
forecasts,  especially  since  the  forecasters  could  give  widely 
varying  probabilities  at  times. 

While  it  is  possible  to  get  widely  varying  probabilities  under 
these  conditions,  experience  suggests  that  it  is  rare,  especially 
with  the  tempering  influence  of  MOS  guidance.  H.  Roberts  (1968) 
indicated  that  differing  probabilities  need  not  reflect  on 
probability  forecasting  when  he  said,  "The  act  of  assessing  a 
probability  can  be  itself  thought  of  as  a  decision,  and  it  is 
commonplace  that  different  people  do  not  make  the  same  decision 
when  confronted  with  the  same  circumstances."  He  went  on,  "This 
divergence  is  not  a  symptom  of  a  weakness  in  the  concept  of 
subjective  probability;  after  all,  in  the  days  of  deterministic 
forecasts,  meteorologists  sometimes  disagreed." 

The  situation  is  similar  for  guidance  probability  forecasts.  Such 
forecasts  are  not  necessary  but  they  can  be  helpful.  The  success 
of  early  experiments,  as  discussed  just  above  and  made  before  the 
advent  of  such  guidance  proves  this  point.  However,  guidance  exists 
in  Model  Output  Statistics  (MOS),  and  it  is  quite  good  (Glahn,  et. 
al.,  1978),  so  that  neglect  of  MOS  could  cause  poorer  scores.  How 
should  guidance  be  used?  More  on  this  in  section  8-10,  but  for  now, 
there  are  two  ways.  One,  probably  the  way  of  the  long-experienced 
forecaster,  is  to  make  a  complete  and  independent  forecast  including 
that  of  the  probability  before  looking  at  the  specific  guidance 
for  probability.  If  the  two  forecasts  are  essentially  the  same, 
the  final  forecast  is,  at  most,  a  minor  adjustment  of  the  initial 
forecast.  If  the  two  forecasts  are  markedly  different,  the 
forecaster  must  re-examine  his  conclusions  to  see  how  far  from 
guidance  he  will  finally  go.  It  is  greatly  to  the  forecaster's 
advantage  to  make  a  major  change  in  the  guidance  probability,  PROVIDED 
it  is  a  change  in  the  right  direction. 

The  other  way,  probably  the  way  of  the  less  experienced  forecaster, 
is  to  look  at  the  guidance  value  early,  and,  in  the  forecast 
preparation,  decide  which  way  and  how  much  to  deviate  from  the 
guidance  value.  The  first  way  is  preferable  because  the  forecaster 
is  more  likely  to  make  a  large  deviation  when  skill  allows,  instead 
of  being  initially  biased  by  the  guidance  value.  It  is  quite 
possible  that  the  use  of  the  second  method—looking  at  guidance 
first--is  contributing  to  the  leveling  off  or  decline  in  scores 
that  have  been  reported  by  some  the  past  few  years,  e.g.,  Sanders 
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(1973),  Cook  and  Smith  (1977),  Snellman   (1977),   and  Ramage  (1978). 
Forecasters  must  try  to  maximize  their  input  in  order  to  remain 
ahead  of  the  ever-improving  MOS  guidance. 

b.  Consensus 

A  possible  way  to  improve  the  forecast  is  to  use  a  consensus 
forecast.  Such  a  forecast  is  the  average  probability  of  a  group, 
such  as  those  at  a  map  discussion  or  those  on  a  shift.  It  is 
obvious  after  a  moment's  reflection  that  on  each  occasion,  the 
consensus  forecast  is  better  than  someone,  but  is  also  worse  than 
someone.  However,  a  number  of  experiments  have  shown  consensus  to 
be  best  in  the  long  run:  Munn  (1953),  Sanders  (1967)  Thompson  (1977) 
and  Winkler  et  al .  (1977).  Some  of  this  gain  may  be  artificial  and 
induced  simply  because  formal  scoring  this  way  was  done.  This  is 
because,  as  Sanders  notes,  the  high  scorer  of  the  group  tends  to  be 
underconfident5 in  order  not  to  lose  a  large  amount  any  time,  and 
thus  hopes  to  stay  ahead,  while  those  behind  are  overconfident, 
hoping  for  a  big  score  and  a  gain  on  the  leader.  However,  both 
these  strategies  are  detrimental  to  the  forecaster  compared  to 
consensus.  Winkler  et  al.  (1977)  noted  that  the  larger  the  number 
of  participating  forecasters,  the  better  is  consensus,  while 
Thompson  (1977)  gave  a  mathematical  proof  that  a  consensus  forecast 
should  be  better. 

c.  Forecast  principles 

The  many  aspects  of  physical,  dynamical,  and  kinematical 
meteorology  pertinent  to  making  probability  forecasts  are  far 
beyond  the  scope  of  this  paper,  but  a  few  points  can  be  made.  A 
paper  by  Bennett  et  al.  (1968)  attempted  this  for  the  layman,  and 
their  figures,  one  of  which  is  shown  here  as  Fig.  2,  gave  the  space 
variation  of  probability  for  half  a  dozen  idealized  synoptic 
situations,  mostly  winter  types.  This  really  conveys  the  concept 
to  the  non-forecaster  rather  than  helping  the  forecaster,  because 
of  the  vast  number  or  variety  of  synoptic  situations.  Nevertheless, 
the  concept  is  valid  for  the  forecaster. 

Basically,  thinking  about  Eq.  (1)  can  be  useful,  and  this  is  what 

Bennett  et  al.  were  trying  to  show.  To  do  this,  first  estimate  the 

areal  probability,  P  ,  by  considering  the  chance  that  an  existing 

a 

precipitation  area  will  move  into  your  forecast  area.  Then, 

regardless  of  how  small  that  chance  is,  as  long  as  it  is  not  zero, 

estimate  what  areal  coverage,  C  ,  you  might  have  vf  the 

precipitation  arc  does  indeed  reach  your  area.  For  example,  you 

might  say  that  there  is  only  a  small  chance,  say  20%,  that  the 

upstream  precipitation  area  will  reach  you  in  a  particular  forecast 

period,  but  is  expected  £o  be  widespread  rain  with  100%areal  coverage, 

The  forecast  would  then  be  20%.  On  the  other  hand  you  might  be 

100%  certain  that  showers  will  occur  in  the  forecast  period  in 

your  area  (P  =  100%),  but  they  will  be  sparse,  say  only  20% 
a 
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WARM  FRONT  APPROACHING  FROM  SOUTHWEST 


Figure  2. --Map  of  probabilities    (from  Bennett  et  al . ,   1968) 
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coverage  in  the  forecast  period  (not  instantaneous,  but  over  the 
whole  period).  Again  the  forecast  would  be  20%.  Of  course, 
precipitation  areas  are  born,  mature  and  die,  and  this  must  be 
considered.  To  help  with  this,  one  should  take  care  to  understand, 
as  physically  as  possible,  the  cause  for  precipitation  on  the  initial 
chart  and  the  factors  that  operate  to  change  this  situation. 
These  are  discussed  by  Hughes  (1977a). 

The  first  part  of  the  forecast  is  where  the  forecaster  should 
concentrate,  both  because  it  is  probably  most  useful  to  the  decision 
maker  and  because  that  is  where  the  greatest  advantage  over 
guidance  lies.  The  latter  is  because  the  skill  of  the  forecaster 
generally  decreases  faster  than  that  of  guidance  and  also  the 
forecaster  has  additional  and  later  data  over  that  used  for  the 
guidance  forecast,  and  the  value  of  observations  decreases  with 
time. 

When  judging  area!  coverage  from  radar,  one  must  be  careful  to 
properly  interpret  the  scope.  Due  to  spreading  of  the  beam  with 
distance,  echoes  at  a  distance  are  enlarged  from  what  they  would  be 
in  close,  and  at  times  the  individual  cells  may  merge  on  the  scope. 
This  false  solidity  of  a  line  of  echoes  can  easily  lead  one  to 
issue  unrealistically  high  probabilities  for  the  first  period  of  . 
the  forecast,  because  as  the  area  or  line  of  echoes  comes  closer  it 
usually  breaks  up  into  separate  cells,  with  non-rain  areas  between. 
The  cells  may  again  merge  into  a  solid  line  as  the  echoes,  after 
passing  the  station,  are  again  at  some  distance,  and  beam  width 
effects  are  again  pronounced. 

A  similar  effect  occurs  to  the  person  outdoors  looking  at  the 
showers  in  the  area,  especially  in  areas  where  one  can  see  a 
considerable  distance  in  all  directions.  In  this  case,  it  appears 
that  the  showers  are  almost  always  avoiding  the  local  area,  and  the 
uninitiated  may  look  for  a  physical  effect  and  tend  to  improperly 
lower  probabilities  in  the  forecast.  This  effect,  the  encirclement 
illusion,  is  discussed  by  McDonald  (1959).  It  is  caused  by  the 
fact  that  one  can  see  a  yery   large  area,  but  experience  the  rain 
only  in  a  small  area.  Since  the  areal  probability  gets  larger  as 
the  area  gets  larger,  the  effect  is  inevitable. 

d.  Probability  in  body  of  forecast 

The  problem  of  where  to  put  the  probability  number,  in  the 
body  of  the  forecast  or  at  the  end,  is  still  being  discussed  (see 
Pielke  1977).  At  the  beginning  of  the  NWS  national  program  in 
1965  it  was  decided  to  place  it  at  the  end,  as  was  done  in  earlier 
probability  efforts  (see  Section  2),  mainly  because  this  would 
leave  the  worded  the  forecast  unchanged  for  those  who  did  not  want 
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the  added  information  given  by  the  probability  or  were  confused  by 
it.  While  this  is  a  valid  point,  there  are  other  reasons  for  the 
same  action.  At  first  thought,  the  body  of  the  forecast  seems  the 
best  place  for  the  number.  Why  say  "chance  of  rain"  in  the  body 
and  then  put  the  probability  at  the  end,  when  it  would  be  better  to 
say  "40%  chance  of  rain"  in  the  body  of  the  forecast  and  be  done 
with  it?  A  good  point.  However,  these  days  it  is  felt  that  the 
mission  of  the  forecaster  is  to  give  detail  in  the  forecast,  if 
possible,  especially  in  the  first  part  of  the  the  forecast.  If 
such  detail  is  given,  it  is  very  easy  to  create  confusion  as  to 
what  part  of  the  forecast  the  probability  applies.  This  was  the 
problem  with  Pielke's  (1977)  suggestion,  as  noted  by  Hughes  (1978a). 
Putting  the  probability  in  the  body  of  the  forecast  can  be  easily 
and  correctly  done  under  certain  conditions,  but  the  problem  is  so 
complex  that  to  expect  the  many  hundreds  of  NWS  forecasters  who 
make  forecasts  each  day  to  consistently  do  it  right  is  unrealistic. 
Also,  such  action  would  discourage  adding  detail  and  it  is  more 
important  for  the  forecaster  to  add  detail  in  the  body  of  the 
forecast  than  to  have  tha  probability  there.  Such  detail  pertains 
to  the  amount  of  precipitation.  The  main  hope  for  putting  the 
probability  in  the  body  lies  in  the  computer  worded  forecasts  of 
Glahn  (1978)  and  Heffernan  and  Glahn  (1979).  Computers  are  certainly 
dumb,  but  they  follow  rules  exactly  (barring  malfunctions),  even 
though  the  rules  are  complex. 

6.  TYPES  OF  PROBABILITY  FORECASTS 

a.  Point  vs.  areal  probability  -  again 

In  section  4d  point  vs.  areal  probability  were  discussed.  Let 
it  be  stressed  again  that  before  making  or  using  a  probability  one 
must  know  whether  it  is  a  point  or  areal  probability,  and  if  an  areal 
one,  for  how  large  an  area.  This  is  because  the  areal  probability 
is  almost  always  the  larger  no  matter  what  the  area,  and  the  larger 
the  area,  generally  the  larger  the  areal  probability.  In  most 
cases,  the  point  probability  depends  relatively  little  on  the  size 
of  the  area  over  which  it  is  an  average. 

The  point  probability  is  usually  more  useful  for  common  or  widespread 

events  such  as  precipitation,  and  areal  probability  is  used  for  less 

common  events  or  those  which  affect  a  small  area,  such  as  tornadoes 

or  severe  weather  in  general.  The  reason  for  using  areal  probability 

for  some  things,  even  though  it  is  less  preferable  for  decision 

making,  is  the  psychological  effect  of  the  extremely  low  point 

probabilities.  For  example,  for  a  tornado,  the  average  tornado  box 

of  SELS  is  about  28,000  sq.  miles,  while  the  area  covered  by  an 

average  tornado  is  only  about  0.2  sq.  mile.  Then,  even  if  SELS  was 

cartain  that  at  least  one  tornado  would  occur  in  the  box  (P^  =  100%) 

a 
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and  even  if  they  expected  several  tornadoes  in  the  box,  using  Eq. 
(1)  given  earlier,  one  can  see  that  the  best  point  probability  for 
a  box  would  be  a  tiny  fraction  of  one  percent.  Such  small 
percentages  would  not  be  sufficiently  impressive,  and  changes  from 
one  case  to  another,  even  by  a  factor  of  10  might  not  be  appreciated. 
Of  course,  one  could  inflate  the  numbers  by  multiplying  by  a  large 
constant,  as  is  done  with  baseball  batting  averages,  but  using  a 
fairly  large  forecast  area  (SELS  box)  of  about  the  same  size  each 
time  and  using  areal  probability  tends  to  do  the  same  thing,  although 
with  less  precision. 

On  the  other  hand,  some  low  frequency  and  low  area  events  can  be 
forecast  with  high  enough  probability  to  use  the  point  probability, 
such  as  for  the  strong  downslope  winds  in<the  Boulder  area  and  the 
Lake  Michigan  seiche  in  the  Chicago  area. 

Statistical  techniques  for  probability  represent  a  special  case 
because  at  times  the  areal  probability  is  used  simply  because  the 
data  supply  does  not  give  enough  events  at  a  point  for  the  scheme 
to  be  stable.  Using  areal  probability  is  a  way  to  increase  the 
number  of  events  to  statistically  meaningful  size.  Another  method 
for  handling  the  variation  in  very  small  probabilities,  although 
this  has  not  been  done  as  yet,  is  to  express  the  chance  of  the 
event  as  the  number  of  times  the  probability  is  higher  than 
climatology,  e.g.,  "the  probability  of  a  tornado  today  is  20  times 
normal."  However,  the  cost/loss  ratio  .principle  discussed  earlier 
is  more  awkward  to  apply  in  such  cases. 

b.  For  severe  storms 

An  experiment  by  Murphy  and  Winkler  (1977b)  has  shown  that  the 
Severe  Local  Storms  (SELS)  forecasters  have  the  ability  to 
successfully  put  areal  probabilities  on  their  severe  storm  outlooks 
and  watches,  but  the  ability  to  specify  in  addition  the  probability 
that  the  condition  will  be  widespread  or  especially  severe  is  more 
limited.  However,  with  the  feedback  of  an  appropriate  verification, 
one  would  expect  that  even  more  reliable  forecasts  could  easily  be 
made.  The  Techniques  Development  Laboratory  has  developed 
objective  areal  probability  guidance  for  thunderstorms  and  severe 
thunderstorms  (Charba  1977,  Reap  and  Foster  1977,  Charba  1979). 
The  latter  is  a  conditional  probability  in  that  it  was  derived  using 
only  cases  with  thunderstorms.  Thus,  in  using  it,  a  thunderstorm 
must  be  in  existence  or  certain  to  occur  for  the  probability  to  be 
taken  at  full  value.  Because  these  are  areal  probabilities,  one 
must  know  the  size  of  the  area  which  they  apply  in  order  to  properly 
interpret  the  probability. 

c.  For  continuous  variables 

Point  or  areal  probabilities  can  be  made  for  a  particular 
moment,  e.g.,  for  the  occurrence  of  precipitation,  fog,  or  ceiling 
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below  1000  ft  at  a  particular  observation  time.  This  type  of 
probability  is  best  applied  to  a  continuous  variable  such  as 
ceiling  height,  as  done  by  MOS  (National  Weather  Service,  1978), 
rather  than  to  discontinuous  ones  such  as  precipitation.  The 
continuous  variable  is  split  into  ranges,  thus  increasing  the 
chance  of  the  event  over  that  for  a  specific  value.  The  main 
problems  then  arise  when  the  climatological  frequency  of  a  selected 
range  is  \/ery   small.  Using  ranges  with  equal  climatic  frequency 
would  be  ideal  for  development,  but  this  is  frequently  not  of 
operational  use  by  decision  makers. 

d.  For  Maximum  or  Minimum  Temperature 

Another  way  to  handle  ranges  of  a  variable  is  when  probabilities 
are  made  for  the  maximum  or  minimum  value  of  a  continuous  variable 
within  a  selected  time  period.  This  was  done  for  temperature  by 
NWS  forecasters  at  Detroit  and  Denver  in  experiments  by  Petersen  et 
al  (1972)  and  Murphy  and  Winkler  (1974a),  respectively.  There  are 
two  ways  in  which  such  probabilities  can  be  expressed  (or  implied) 
and  both  were  used  in  the  latter  experiments.  One  had  a  preselected 
range  of  temperature  essentially  centered  on  the  forecast  maximum 
(both  a  5°F  and  9°F  range  were  used),  and  the  forecaster  gave  the 
probability  of  the  maximum  temperature  lying  in  that  range.  The 
other  had  a  preselected  probability,  and  the  forecaster  adjusted 
the  range  of  temperatures  around  the  expected  maximum  so  as  to  fit 
that  probability.  In  these  experiments,  the  probabilities  were  50% 
and  75%,  although  67%  may  be  a  reasonable  value  as  well.  For 
example,  if  one  is  forecasting  at  the  50%  level,  the  range  of 
temperatures  forecast  around  the  expected  maximum  is  just  large 
enough  so  that  the  observed  temperature  will  be  in  the  forecast 
range  50%  of  the  time.  The  higher  the  probability  selected,  the 
larger  the  temperature  interval  forecast  in  any  particular  case. 

An  orderly  scheme  was  given  to  the  forecasters  for  determining  the 
size  of  the  interval  for  a  particular  preselected  probability, 
starting  with  the  most  likely  temperature  (see  Peterson  et  al.  1972). 
The  forecasters  had  unexpectedly  good  reliability  in  these 
experiments,  using  both  the  fixed  and  variable  range.  It  was 
unexpected  because  it  was  the  first  time  the  forecasters  had  done 
this  type  of  probability  forecasting,  and  they  had  no  formal 
verification  to  provide  feedback  on  bias. 

e.  Frost  and  Freeze 

Another  type  of  temperature  forecasting  in  probability  terms  is 
being  done  by  Gregg  (1977).     This  is  part  of  the  fruit-frost  program 
in  New  Mexico.     The  forecasts  give  the  probability  of  the  minimum 
temperature  at  selected  points  reaching  28°F  or  lower.     When  a 
particular  threshold  has  much  more  meaning  than  other  temperatures, 
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this  approach  is  better  than  the  one  discussed  above. 

An  expanded  version  of  Gregg's  approach  is  now  in  use  for  forecasts 
of  ceiling  and  visibility  in  the  MOS  system  (National  Weather  Service, 
1978).  Ranges  of  these  parameters  that  are  operationally  meaningful 
are  selected  in  advance,  and  the  probability  of  each  range  is  then 
forecast.  Obviously,  from  these  data  one  can. get  the  probability  of 
the  variable  being  above  or  below  any  threshold  that  is  an  upper  or 
lower  limit  of  any  range.  The  problem  with  using  such  ranges  is  in 
selecting  operationally  meaningful  ones.  The  number  of  such  ranges 
to  define,  especially  in  an  objective  scheme, -is  dependent  only  on 
the  sample  size— using  many  ranges  takes  a  larger  sample. 

f.  Categorical  from  Probabilistic  and  Reverse 

Creating  categorical  forecasts  from  probability  forecasts  is 
easily  and  precisely  done.  Simply  select  a  threshold,  like  50%,  and 
all  forecasts  of  that  probability  and  higher  are  categorical  "yes" 
forecasts  and  the  remainder  are  "no"  forecasts.  Any  threshold  may 
be  chosen,  depending  on  what  characteristics  one  wants  the  forecasts 
to  have.  Ways  to  do  this  for  various  scores  are  discussed  by  Glahn 
and  Bocchieri  (1972)  and  Glahn  (1974). 

Early  in  the  probability  probram  of  the  NWS,  and  later  for  some 
offices,  forecasters  continued  to  make  categorical  forecasts  of 
precipitation  for  selected  locations  in  their  forecast  area  of 
responsibility.  These  were  for  verification  only,  and  had  the 
stated  threshold  of  50%  for  a  categorical  "yes".  These  forecasts 
always  had  about  the  same  number  of  rain  forecasts  as  there  were 
observed  rains.  But  when  the  concurrent  forecasts  in  probability 
numbers  were  converted  to  categorical  forecasts  using  the  50% 
criterion,  it  was  noticed  by  Hughes  (1965)  and  others  that  the  number 
of  rain  events  forecast  was  considerably  less  than  the  number 
observed.  To  make  the  number  the  same  would  require  lowering  the 
threshold  at  least  as  far  down  as  40  percent.  This  suggests  that 
categorical  forecasters  have  difficulty  adhering  to  a  selected 
threshold.  This  is  critical  when  trying  to  compare  categorical 
forecasts  with  probabilistic  ones  by  converting  the  latter  using  a 
threshold.  It  has  been  shown  by  Hughes  and  Sangster  (1970)  that 
this  is  not  worthwhile  because  a  discontinuity  occurs  at  the  start 
of  the  probability  forecasts  if  the  50%  threshold  is  used.  Adjusting 
the  threshold  downward  so  as  to  match  that  probably  used  by  the 
categorical  forecasts  is  so  subjective  as  to  make  comparisons  have 
little  meaning. 

Making  probability  forecasts  out  of  categorical  forecasts  cannot  be 
done  well.  A  way  to  do  this  is  given  by  C.  Roberts  (1965),  but  the 
result  is  crude  at  best. 
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7.  HOW  VERIFIED 

The  main  use  of  verification  data  by  or  for  the  forecaster  is  to 
locate,  understand,  and  then  reduce  bias  in  order  to  get  better 
forecasts  in  the  future.  A  comprehensive  verification  of  this  sort 
has  been  run  by  the  NWS  Central  Region  for  over  13  years,  and  its 
use  has  been  discussed  in  detail  by  Hughes  (1967b,  1979). 
Verification  also  measures  ability  or  utility.  The  next  six  sections 
relate  to  all  these  points. 

Verification  is  more  important  in  probability  than  in  other  types 
of  forecasting  because  the  quality  of  the  forecast  is  not  obvious 
the  next  day.  The  only  way  to  judge  whether,  say,  the  40% 
probability  is  well  forecast  is  to  group  a  number  of  such  forecasts 
and  compare  forecast  and  observed  precipitation  frequencies.  Large 
biases  can  easily  exist  without  a  verification,  and  they  can  even 
exist  with  a  verification,  as  discussed  in  section  8. 

a.  Brier  Score 

The  universally  accepted  verification  score  for  probability 
number  forecasts  is  the  score  of  Brier  (1950)  which  is  usually 
modified  for  two  categories  to  be: 

P  =  (F  -  E)2  (3) 

This  form  is  for  calculating  the  score  for  a  single  forecast.  P  is 
the  Brier  score6,  F  is  the  forecast  probability  expressed  as  a 
decimal  fraction,  and  E  is  either  1  or  0  depending  on  whether  the 
event  occurs  or  does  not  occur  respectively  at  the  verifying  point 
or  area  as  appropriate.  One  can  easily  see  that  when  1  or  0  is 
taken  as  perfection,  the  score  is  simply  the  error  squared.  When 
the  average  of  a  number  of  forecasts  is  taken,  the  score  is  the 
mean  square  error.  Since  the  score  is  a  measure  of  error,  a  low 
score  is  better,  with  the  actual  range  being  0  to  1 .  This  score  is 
discussed  in  detail  by  Hughes  (1965,  1967c).  Some  of  this  detail 
is  given  below. 

This  score  has  several  favorable  attributes.  The  first,  which  is 
the  goal  of  ewery   verification  scheme  (see  Brier  and  Allen  1950) 
but  is  a  charasteristic  of  perhaps  no  other,  is  that  the  score  is 
not  playable  by  the  forecaster,  i.e.,  it  influences  the  forecaster 
in  no  undesirable  way,  and  the  forecaster  is  encouraged  by  the 
scoring  system  to  put  out  the  best  forecast  skill  allows. 
Probabilities  seemingly  tailored  just  to  improve  the  score  may 


6This  form  is  actually  half  the  Brier  score.  The  full  score  would 
add  the  score  for  the  probability  of  no-rain.  Since  the  score  for 
rain  and  no-rain  are  always  exactly  the  sane,  the  score  for  the  no- 
rain  is  dropped  for  convenience.  In  other  than  the  dichotomous 
situation,  the  full  Brier  score  should  be  used. 
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actually  hurt  the  score.  Murphy  and  Epstein  (1967)  and  Murphy 
(1970)  proved  the  non-playability  of  this  score,  and  called  it  a 
"strictly  proper"  score. 

However,  there  is  one  playable  aspect  in  the  verification  system 
in  that  the  forecast  is  usually  verified  by  a  single  gage  known  to 
the  forecaster.  If  the  probability  varies  sizably  over  the  forecast 
area,  the  forecaster  should  use  two  probabilities  for  the  first  period, 
as  discussed  in  the  Definitions  section,  and  not  tailor  the  probability 
to  the  verifying  gage.  This  is  especially  the  case  if  the  gage  is 
not  where  the  bulk  of  the  decision  makers  are  located.  It  would 
be  best  if  the  location  of  the  verifying  gage  were  not  known  to  the 
forecaster.  Then  it  would  be  impossible  to  not  serve  the  whole  area 
equally,  and  one  would  have  to  forecast  the  average  or  use  two 
probabilities.  However,  it  is  difficult  to  keep  knowledge  of  the 
gage  from  the  forecaster,  and  in  some  places  impossible  because  of 
the  lack  of  adequate  gages.  If  the  local  area  is  split  and  two 
probabilities  are  used,  the  one  in  the  area  containing  the  verifying 
gage  should,  of  course  be  used  as  the  forecast  to  verify. 

Another  attribute  of  the  score,  also  probably  unique  and  extremely 
important,  is  that  the  score  is  related  to  the  economic  utility  of 
the  forecasts  through  the  cost/loss  principle.  This  was  mentioned 
by  Hughes  (1965),  and  proven  by  Murphy  (1966)  for  a  uniform  cost/ 
loss  ratio  distribution.  For  a  discussion  of  the  more  realistic 
non-uniform  C/L  distribution,  see  Murphy  (1969). 

These  attributes  combine  to  give  both  the  forecaster  and  management 
a  strong  incentive  to  have  the  forecaster  get  the  best  possible 
score--the  most  useful  forecasts.  I  have  heard  forecasters  say  that 
they  want  to  forecast  the  weather,  not  "play  the  numbers  game." 
Management  should  encourage  the  forecaster  to  play  this  numbers 
game,  because  the  scoring  system  is  so  good.  Not  to  do  so  is  a 
mistake,  because  there  is  no  way  to  bet  a  better  forecast  without 
getting  a  better  score,  and  there  is  no  way  to  honestly  better  the 
score  without  getting  better  forecasts. 

It  is  a  simple  score  that  is  easy  to  calculate.  The  form  of  the  score 
is  given  by: 

Pp  =  [MF2  +  R(1-2F)]  (4) 

This  gives  the  total  score  PF  for  a  particular  forecast  probability 

F,  where  M  is  the  number  of  forecasts  of  that  probability,  and  R  is 
the  number  of  precipitation  events  occurring  on  those  forecasts. 
This  is  a  convenient  form  because  F2  and  1-2F  can  be  calculated 
ahead  of  time,  so  only  two  simple  multiplications  are  necessary. 
Of  course  this  must  be  done  for  each  probability  used,  then  these 
are  added  and  the  total  is  then  divided  by  the  number  of  forecasts 
to  get  the  Brier  score. 
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Figure  3  from  Hughes  (1965)  shows  the  score  in  graphical  form. 
There  are  several  things  to  note  from  this:  1)  the  50%  forecasts 
get  a  score  of  0.25  no  matter  which  precipitation  event  occurs;  2) 
the  best  score  for  a  particular  observed  frequency  (go  horizontally 
on  the  graph)  is  on  the  diagonal  line--the  line  where  the  observed 
frequency  equals  the  forecast  probability.  This  is  called  the  line 
of  perfect  reliability.  Thus  a  set  of  forecasts  having  the  same 
observed  frequency  as  the  forecast  probability  gets  the  best  score, 
i.e.,  when  they  are  reliable.  The  cost/loss  ratio  principle  assumes 
that  the  forecasts  are  reliable,  so  this  is  a  desirable  attribute; 
3)  the  best  score  for  a  particular  forecast  probability  (go 
vertically  on  the  graph)  is  with  an  observed  frequency  of  zero  for 
probabilities  of  less  than  50%,  and  100%  for  probabilities  over  50%. 
This  seems  to  oppose  the  reliability  concept,  but  it  doesn't.  For 
example,  if  you  forecast  30%  a  number  of  times  and  get  precipitation 
only  10%  of  the  time,  you  get  a  score  of  0.12,  but  if  you  used  all 
the  skill  you  had  and  forecast  10%  each  time  instead  of  30%,  you 
would  then  be  reliable  and  get  a  score  of  0.09--lower  and  better. 
Operationally  this  says  that  the  forecaster  should  be  less  concerned 
when  looking  at  verification  data  if  the  low  probabilities  have  an 
even  lower  frequency  of  precipitation  and  the  hiqh  ones  an  even  higher 
frequency.  This  is  true  especially  if  the  number  of  forecasts  in 
these  probabilities  is  not  larae.  Most  of  the  time  the  problem  is  the 
opposite  and  shows  over-confidence  by  the  Torecaster. 

Perhaps  points  2  and  3  can  be  seen  more  clearly  as  follows.  Equation 
3  or  4  can  be  expressed  differently  for  one  forecast  probability  as: 


PF  = 


(F  -  cf>)2  -f  <|>  (1  -  «,) 


(5) 


where  <j>  is  the  observed  frequency  of  precipitation  of  all  the 
forecasts  of  probability  F.  This  was  devised  by  Sanders  (1963)  and 
its  derivation  is  easily  seen  from  Hughes  (1965,  pll)7.  The  (F-<}>)2 
reflects  reliability  error  (bias)  and  says  that  the  error  is  zero 
when  the  observed  frequency  <f>  is  the  same  as  the  forecast  probability 
F  (see  also  Fig.  4).  The  <f>  (T  -  $)   reflects  resolution  error  and 
is  zero  only  when  the  observed  frequency  is  either  0%  or  100%  for 
that  forecast  probability  regardless  of  what  it  is  (see  Fig.  5). 
Figs,  4  and  5  are  from  Hughes  (1967c)  and  are  discussed  further 
there.  Note  in  Fig.  4  that  the  bias  is  relatively  small  even  for 
moderate  unreliability,  and  resolution  (Fig.  5)  is  relatively  large 
for  mid-probability  forecasts.  The  major  portion  of  the  Brier  score 
in  fact  comes  from  poor  resolution. 

Forecasters  have  complained  about  the  fact  that  the  Brier  score  is  a 
squared  function.  This  aspect  has  been  discussed  in  depth  by 


7Take  the  score  for  days  with  rain  as  <j>  (F  -  I)2  and  ttye   days 
without  rain  as  (l-c|))F2,  add  the  two,  add  and  subtract  <f>2,  and 
rearrange. 
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Hughes  (1967c)  and  it  was  shown  that  powers  of  1  and  3  yield  highly 
playable  systems  and  thus  are  much  poorer  scoring  systems  than 
power  2,  whether  or  not  the  absolute  value  of  these  odd  powers  is 
taken. 

b.  Skill  score  (regular  and  sample) 

While  the  Brier  score  is  the  basic  score  for  verification,  the 
fact  that  low  scores  are  better  confuses  people,  and  the  size  of  the 
number  is  not  in  itself  meaningful,  i.e.,  what  is  a  good  score  is 
not  obvious.  To  try  to  overcome  these  aspects  and  also  to  relate 
the  score  of  the  forecasters  to  a  non-skill  but  before-the-fact 
forecast,  a  skill  score  was  devised  related  to  climatology,  as 
suggested  by  Jorgensen  (1962)  and  along  the  lines  used  by  Sanders 
(1963).  It  is  of  the  form 

100(PC-PF)  (6) 

Pfc 

where  Pp  is  the  Brier  score  of  a  set  of  forecaster  probabilities, 

and  Pp  is  the  Brier  score  of  the  same  set,  but  using  climatological 

probabilities  as  forecasts.  The  climatological  probabilities  are 
the  long-term  (over  10  years)  values  appropriate  to  the  month,  time 
of  day,  and  the  length  of  the  forecast  period,  such  as  those  of 
Jorgensen  (1967)  or  Hughes  (1966b). 

A  perfect  skill  score  is  100  in  keeping  with  traditional  grading 
concepts,  zero  indicates  no  skill,  and  negative  scores  are  possible. 

The  skill  score  is  used  as  the  ultimate  measure  of  the  quality  of 
the  forecasts.  For  a  sizable  set  of  forecasts,  it  has  all  of  the 
desirable  characteristics  given  above  for  the  Brier  score  (see 
Murphy  1973).  However,  100  is  unattainable  because  we  forecast  the 
average  point  probability  which  equates  to  the  observed  areal 
coverage,  so  much  of  the  time  it  is  impossible  to  use  the  very  high 
probabilities,  especially  in  the  warm  season  (see  section  4d). 

Some  people  have  used  the  sample  skill  score  (e.g.,  Sadowski  and 
Cobb  1974).  This  uses  observed  frequency  of  the  forecasts—the 
short-term  frequency—instead  of  the  long-term  frequency  of  the 
forecasts,  and  is  discussed  by  Murphy  (1974).  The  scores  using  the 
sample  frequency  are  poorer  than  scores  from  the  long-term  frequency 
and  the  score  is  undesirable  for  two  other  reasons,  1)  the  sample 
frequency  is  not  a  value  known  ahead  of  time  and  therefore  it  can 
not  be  used  to  create  a  forecast  ahead  of  time  against  which  to 
compete;  therefore,  comparison  with  it  is  a  different  breed  of  score 
from  that  using  long-term  climatology,  2)  it  takes  away  from  the 
forecaster  some  score  based  on  the  ability  to  distinguish  deviations 
from  the  long-term  climatology,  i.e.,  wet  and  dry  regimes.  This  is 
part  of  the  forecaster's  skill  for  which  credit  should  be  given. 
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c.  Skill  Score  vs.  Reduction  of  Variance 

Many  objective  schemes  for  forecasting  are  now  derived  via 
screening-regression,  and  the  measure  of  the  quality  of  the  scheme 
and  the  contribution  of  the  individual  terms  is  given  by  the 
reduction  of  variance.  It  was  noted  and  then  proven  by  Sangster 
(1970)  that  the  reduction  of  variance  and  the  skill  score  are  equal 
under  certain  conditions.  They  are  exactly  equal  if  the  relative 
frequency  of  precipitation  in  the  sample  used  in  the  derivation  is 
exactly  equal  to  the  longer-term  frequency  used  in  verification,  and 
the  probabilities  used  in  the  reduction  of  variance  computation  are 
in  the  range  zero  through  one.  The  deviations  from  these  ideal 
conditions  are  usually  small  enough  and/or  infrequent  enough  that 
there  is  usually  only  a  small  difference  between  the  skill  score 
(in  percent)  and  the  reduction  of  variance  (in  percent).  In  general 
the  deviations  from  the  ideal  conditions  are  such  that  the  reduction 
of  variance  is  usually  slightly  less  than  the  skill  score.  This 
means  that  if  the  reduction  of  variance  of  a  forecast  scheme  is 
better  than  your  skill  score,  the  scheme  should  be  given  a  lot  of 
consideration. 

d.  Use  of  Multiple  Gages 

Forecasters  dislike  the  use  of  only  one  rain  gage  to  verify 
their  probability  forecasts,  even  though  a  point  probability  is 
correctly  verified  that  way.  However,  they  do  have  a  point, 
because  their  forecast,  as  noted  earlier,  is  the  average  point 
probability  in  the  forecast  area,  since  it  should  apply  equally  to 
any  point  in  the  area.  If  we  had  a  number  of  gages  in  the  forecast 
area,  how  should  they  be  used  to  properly  verify  the  probability 
forecast?  It  would  not  be  correct  to  have  measurable  rain  at  any 
gage  be  called  a  rain  event  because  that  would  be  verifying  the 
areal  probability,  not  the  point  probability.  One  correct  way 
would  be  to  apply  the  forecast  probability  to  each  gage-point  and 
verify  each  gage-point  separately,  then  average  the  result.  One  can 
see  from  this  that  as  long  as  there  are  no  local  effects  in  the 
forecast  area  that  cause  some  places  to  regularly  get  more  rain  than 
other  places,  the  result  from  any  gage  and  result  for  all  gages 
averaged  together  would  be  the  same  in  the  long  run. 

Thus  the  main  advantage  of  multiple  gages  is  that  it  shortens  the 
time  to  get  a  representative  sample.  However,  a  sample  of  12  months 
of  forecasts  verified  as  a  set  should  get  close  to  the  same  result 
as  using  many  gages,  especially  since  we  are  verifying  rain  frequency, 
QPF  probability  verification  will  be  more  difficult  and  require 
larger  samples.  The  disadvantage  of  multiple  gages  is  the  workload 
of  both  gathering  and  processing  the  data. 
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e.  Modification  of  Brier  Score 

If  an  adequate  number  of  rain  gages  existed  in  each  local 
area  or  the  actual  areal  coverage  could  be  obtained  reasonably  from 
radar,  it  should  be  better  to  use  a  further  modification  of  the  Brier 
score,  as  discussed  by  Curran  and  Hughes  (1968)  and  by  D.  Smith 
(1977),  for  verifying  the  average  point  probability  as  forecast  for 
the  local  area.  The  form  of  the  score  would  be  the  same  as  the 
Brier  score  shown  earlier  except  that  the  E  in  the  score  (Exj.  3), 
instead  of  being  only  zero  or  one  as  in  the  true  Brier  score, 
becomes  the  areal  coverage  observed,  i.e.,  the  portion  of  the  local 
area  actually  receiving  measurable  precipitation  in  the  time  period 
of  concern. 

Such  a  score  would  take  out  some  of  the  scatter  shown  earlier  and 
would  certainly  give  much  higher  scores  and  therefore  a  psychological 
lift  to  the  forecasters.  It  would  greatly  alleviate  the  trace- 
measurable  problem  and  it  would  also  alleviate  other  problems  of 
using  one  location  to  obtain  the  verifying  data,  a  method  that  could 
also  compromise  the  probability  a  bit,  at  least  for  the  first  period 
as  discussed  earlier.  This  method  would  also  allow  a  representative 
sample  to  be  achieved  faster  because  the  minimum  number  of  forecasts 
that  could  make  a  forecast  probability  reliable  would  be  only  one 
(if  10  gages  were  available),  no  matter  what  the  probability,  instead 
of,  for  example,  ten  forecasts  for  a  90%  probability.  The  method 
then  overcomes  the  first  two  (the  main)  reasons  for  non-acceptance 
of  a  verification  system  mentioned  by  Gringorten  (1967).  It  could 
be  done  now  in  selected  locations  because  some  places  have  reasonably, 
good  gage  coverage;  however,  the  logistics  of  verification  would  be 
considerably  increased  as  a  result. 

In  a  sense  this  new  score  would  be  saying  that  it  is  impossible  to 
forecast  exactly  which  part  of  the  local  area  will  be  hit  by  rain 
when  all  parts  are  not  hit,  and  this  generally  is  the  case  except 
where  strong  local  effects  are  acting.  Thus  a  perfect  forecast  of 
the  average  point  probability  would  be  that  of  the  observed  areal 
coverage.  Such  a  forecast  is  possible,  so  perfect  forecasts  would 
be  possible.  They  are  not  possible  now  except  in  areas  where  the 
areal  coverage  is  always  zero  or  100%,  if  such  areas  exist.  However, 
the  best  possible  score  now  is  still  achieved  by  forecasts  equal  to 
the  observed  areal  coverage,  even  though  the  Brier  or  Skill  Score 
would  show  them  to  b£  imperfect.  Nothing  in  the  above  should  be 
taken  as  saying  one  should  issue  forecasts  of  areal  coverage.  Point 
probability  is  the  thing,  (see  section  4d) . 

For  those  wanting  more  on  verification  in  general,  Muller  (1944) 
gives  a  summary  of  55  older  papers  on  the  subject  listed  in 
chronological  order  from  1884  to  1943.  A  discussion  of  a  comprehensive 
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probability  verification  program  and  its  use  is  given  by  Hughes 
(1967b)  and  the  next  section  of  this  paper  is  devoted  to  problems 
shown  by  such  a  comprehensive  verification  in  the  past. 

At  the  other  extreme  is  a  yery   recent  paper  by  Stael  von  Hoi  stein 
and  Murphy  (1978)  which  discusses  the  family  of  quadratic  scoring 
rules,  of  which  the  Brier  score  is  a  special  case.  Perhaps  other 
scores  need  to  be  used.  This  will  probably  be  the  case  if  justice 
is  to  be  done  to  probability  forecasts  of  such  things  as  ceiling 
height  or  the  amount  of  precipitation,  because  in  these  situations 
it  would  seem  that  a  small  error  should  not  be  penalized  as  much  as 
a  large  error.  This  sensitive-to-distance  aspect  is  discussed  in 
the  paper  by  Von  Hoi  stein  and  Murphy  and  its  references. 

8.  Problems  shown  by  Verification 

The  material  below  is  based  on  Hughes  (1979),  which  in  turn  is 
based  on  over  13  years  of  monthly  probability  verification  for  each 
of  66  Weather  Service  offices  and  for  each  forecaster  in  each  office. 
The  verification  data  gave  rapid  feedback  to  forecasters  and 
extensive  subset  verification,  such  as  season,  time  of  day,  lead  time, 
individual  forecaster,  and  guidance.  Such  subsets  are  essential 
because  bias  is  the  main  problem,  and  many  types  of  bias  have  positive 
and  negative  aspects  that  can  eancel  or  appear  minor  when  subsets 
are  combined.  The  basic  score  used  is  the  skill  score  discussed 
above.  Bias  is  given  by 


B  =  RF  '  R0 


x  100,  (7) 


R0 

Where  Rp  is  the  forecast  frequency  of  precipitation  of  the  set  of 

forecasts,  which  is  simply  the  sum  of  the  probabilities,  and  RQ  is 

the  observed  frequency  of  precipitation  for  the  periods  forecast.     A 
value  of  zero  indicates  no  bias. 

Bias  is  subject  to  manipulation  by  forecasters,  but  it  should  be 
reduced  only  by  methods  which  will   raise  the  skill   score;   therefore, 
it  is  necessary  to  instruct  forecasters  in  the  proper  and  improper 
ways  to  do  it.     The  measure  is  a  quick  tool   to  spot  problems;,  however, 
bias  in  subsets  can  compensate  when  the  sets  are  considered  together, 
so  subset  usage  is  highly  desirable,  as  will   be  shown  later.     While 
zero  bias  is  the  goal,   in  subsets  and  overall,   it  is  impossible  and 
perhaps  undesirable  to  obtain  in  the  usual   sample  sizes  used.     In  a 
year's  forecasts—about  4400  from  one  office  (from  three  probabilities 
per  shift  and  four  shifts  per  day) —an  acceptable  and  probably 
irreducible  overall   bias  would  be  less  than  10%.     However,  overall 
biases  of  less  than  20%  are  sometimes  hard  to  reduce  because  the 
problem  is  more  random  error  than  systematic  error.     Essentially 
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zero  bias  should  exist  in  probabilities  of  0%  and  100%  no  matter 
how  small  the  sample  size. 

a.  New  Forecaster  Bias 

Virtually  every   forecaster,  when  starting  to  use  probability 
forecasts,  has  systematic  error  of  initially  overestimating  his/her 
skill  by  excessive  use  of  very  high  and  very  low  probabilities. 
Forecast  skill  naturally  decreases  with  increasing  forecast  lead  time 
(the  time  between  forecast  preparation  and  the  beginning  of  the 
period  for  which  the  forecast  is  applicable).  This  means  that  the 
usable  range  of  precipitation  probabilities  available  to  the  fore- 
caster shrinks  with  lead  time  to  the  single  value  of  the  climatological 
frequency  at  some  time  in  the  future,  something  like  that  shown  in 
Fig.  6  (Hughes  1966a).  Although  most  forecasters  can  reliably  use 
precipitation  probabilities  from  zero  to  100%  in  the  first  12-h 
period,  by  the  third  such  period  the  probability  limits  for  reliable 
forecasts  have  shrunk  considerably.  A  diagram  such  as  Fig.  6 
therefore  serves  as  a  useful  warning  to  new  forecasters  who  are 
overly  tempted  to  use  extreme  probability  values  in  long  lead  time 
forecasts.  When  constructed  from  past  forecasts  as  discussed  next, 
it  can  show  forecasters  specific  limits  of  their  skills. 

Fig.  7  demonstrates  how  range  limits  can  be  obtained  after  adequate 
data  become  available  (data  taken  from  an  actual  forecast  sample). 
The  points  indicate  the  observed  frequency  of  precipitation  for  each 
forecast  probability.  The  solid  line  indicates  equality  of  these 
two,  giving  the  desirable  goal  called  "perfect  reliability".  The 
dashed  line  indicates  the  effective  upper  limit  of  skill.  It  was 
calculated  by  obtaining  the  observed  frequency  of  precipitation  for 
the  set  of  unreliable  forecasts.  These  were  the  high  probability 
forecasts,  i.e.,  60%  or  more.  This  frequency  is  about  60%.  One 
could  also  consider  70%  as  an  upper  limit,  but  probabilities  >  70% 
have  an  observed  frequency  just  under  65%and  the  set  therefore  gets 
a  slightly  poorer  score  at  70%  than  at  60%.  For  the  lower  limit,  the 
zero  forecast  probability  shows  an  observed  frequency  of  about  5%. 
Thus  the  limits  in  this  case  are  5%  and  60%  (or  possibly  70%).  This 
can  be  repeated  for  the  other  forecast  periods  and  for  each  forecaster. 

These  limits  can  change  over  a  sizable  period  of  time  due  to  a 
change  in  forecast  skill.  They  are  usually  dependent  on  the  climatic 
frequency  of  precipitation  as  well,  both  tending  to  be  higher 
(lower)  for  locations  with  a  climatic  frequency  that  is  higher 
(lower).  Once  such  limits  are  obtained,  they  should  not  be  used  as 
absolute  boundaries,  but  should  act  as  a  "red  flag"  warning  to 
proceed  with  caution.  They  can  be  exceeded,  but  only  under 
unusually  favorable  conditions,  for  example  with  a  slowly  moving 
major  storm,  especially  a  hurricane,  or  in  a  pronounced  dry  spell. 
Reasonable  values  for  the  present  for  12  h  forecast  periods  in  a 
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climatology  such  as  that  of  Iowa  would  be  for  "red  flag"  limits  of 
2-80%  for  the  second  period  of  the  forecast  and  5-70%  for  the  third 
period. 

b.     6  h  and  day-night  bias 

Two  other  widespread  forecast  problems  are  evident  from  the 
bias  values  given  in  Table  2.     These  values  were  computed  as  an 
average  from  first  period  forecasts  made  at  66  stations   in  a  2  year 
period  during  the  earlier  years  of  the  Central   Region  program.     N 
indicates  forecasts  valid  for  the  night  period,  D  indicates  day- 
valid  forecasts,  and  subscripts  6  and  12  indicate  forecasts  for 
period  lengths  of  six  and  twelve  hours  respectively.     The  warm 
season  was  April   through  September,  with  the  cold  season  October 
through  March. 

The  first  problem  evident  from  these  data  is  the  overforecasting  of 
precipitation  in  6  h  periods  no  matter  what  the  season  or  time  of 
day.     This  is  the  most  prominent,  widespread,  and  persistent  bias 
noted  in  the  entire  verification  program.     That  this  overforecasting 
stems  mainly  from  a  misunderstanding  on  the  part  of  forecasters  of 
the  effect  of  forecast  period  length  on  forecast  probability,  as 
mentioned  earlier,  was  noted  by  Winkler  and  Murphy  (1968)  and  fully 
confirmed  by  Murphy  and  Winkler  (1974b).     Climatological   data  such 
as  that  of  Jorgensen  (1967)  show  that  over  most  of  the  United  States, 
the  climatic  frequency  of  precipitation  decreases  about  one-third 
when  the  forecast  period  length  decreases  by  one-half.     Forecasters 
tend  to  ignore  this  when  updating  a  12  h  forecast  to  a  6  h  forecast, 
so  that  if  conditions  have  remained  essentially  unchanged,   they  tend 
to  leave  the  precipitation  probability  unchanged  instead  of  on  the 
averane  reducing  it  by  about  one-third,  as  they  should.     This 
produces  a  considerable  overforecast  for  the  6  h  period.     Of  course, 
in  some  cases  if  later  events  show  that  the  12  h  forecast  being 
updated  was  wrong,   the  6  h  probability  may  correctly  be  higher  than 
the  12  h  value.     However,   in  the  long  run,   if  the  6  h  forecasts  are 
to  be  reasonably  reliable,  the  average  probability  must  be  close  to 
the  precipitation  frequency.     For  this  reason,   the  probability  for 
very  short  periods,  such  as  an  hour  or  so,  must  be  quite  a  bit 
lower  than  even  the  6  h  probability,  on  the  average.     This  means 
that  with  the  same  lead  time  it  is  harder  to  forecast  the  yery  high 
probabilities  reliably  in  a  6  h  period  than  in  a  12  h  period. 
Conversely,   it  is  harder  to  forecast  extremely  low  probabilities  in 
the  12  h  period. 

Improperly  adjusting  for  the  length  of  the  forecast  period  is 
probably  a  poorer  way  to  think  of  this  problem  than  improperly 
adjusting  for  the  climatic  frequency.     This  is  especially  important 
and  noticeable  when  the  precipitation  frequency  is  markedly  different 
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Table  2.   Bias  values  (%)  from  forecasts  made  at  66  stations 
in  a  two-year  period  for  night-valid  (N)  and  day- 
valid  (D)  6  and  12  h  first  periods  only. 


N12      N6      D12      D6 

Warm  season  1       23      12       40 

Cold  season  6       22      11       41 
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among  the  6  h  periods.     An  excellent  example  of  this   is  Denver, 
which  has  precipitation  frequencies  for  summer,  according  to 
Jorgensen  (1967)   for  6  h  periods  starting  0000  GMT,  of  .15  and   .05 
(early  tonight  and  late  tonight,  respectively)  and  .03  and  .14  (this 
morning  and  this  afternoon),  while  the  12  h  periods  have  .17  and 
.15--essentially  the  same. 

With  an  update  forecast  made  in  late  morning,  with  the  first  period 
being  "this  afternoon"   (6  h),   the  climatic  frequency  for  "this 
afternoon"   is  little  different  from  that  for  "today"   (.14  vs.    .15), 
so  little  or  no  reduction  of  the  probability  is  necessary,  on  the 
average.     However,   for  the  update  forecast  made  12  hours  later,   the 
frequency  for  "late  tonight"   is  much  smaller  than  for  "tonight"   (.05 
vs.    .17),   and  a  large  reduction  must  usually  be  made  if  the  forecasts 
are  to  be  reliable.     In  a  12  month  sample  of  warm  season  forecasts 
made  for  Denver,  the  bias  for  "this  afternoon"  was  only  3%—trivial  — 
and  about  half  the  bias  would  have  been  if  the  12  h  probability  were 
used  as  a  6  h  forecast.     That  for  "late  tonight"  was  95%  which   is 
very  large  but  still   only  about  half  what  it  would  have  been  had  the 
12  h  forecast  been  used.     These  points  indicate  insufficient 
knowledge  of  the  necessary  adjustment.     If  the  12  h  climatic 
frequencies  were  used  as  a  6  h  forecasts,   the  biases  would  be  (.15  - 
.14)/. 14  =  7%  and  (.17  -   .05)/. 05  =  240%  for  the  afternoon  and  late 
tonight  periods,   respectively.     This  clearly  shows  that  each  office 
must  compare  its  6  and  12  h  probabilities  and  take  appropriate 
action  on  the  update  forecasts  or  when  splitting  a  12  h  period  into 
two  6  h  periods. 

Getting  forecasters  to  "think  small"  when  updating  a  12  h  probability 
forecast  to  a  6  h  forecast  has  been  a  lengthy  and  difficult  task, 
and  it  is  not  yet  fully  accomplished.     It  is  probably  aggravated  by 
the  fact  that  it  is  not  necessary  for  other  variables  forecast,  such 
as  temperature,   ceiling,  or  visibility.     However,   unless  the 
probability  verification  is  concerned  with  the  6  h  period,  as  a 
subset  (and  few  are),   this  bad  bias  will   remain  undetected. 

The  opposite  problem  with  period  length  came  to  light  recently  with 
a  forecast  which  gave  30%  for  this  afternoon   (a  6  h  period)  and  40% 
for  tonight  (a  12  h  period).     When  queried,   the  forecaster  said  that 
the  higher  probability  for  tonight  was  because  a  line  of  showers  was 
due  in  the  area  early  tonight.     This  forecaster  did  not  realize 
that  this  was  an  inconsistent  set' of  forecasts  for  that  reasoning. 
When  going  from  a  6  to  12  h  period  one  has  to  do  the  opposite  of 
think  small,  and  think  big.     A  first  guess  at  a  12  h  probability 
starting  with  the  6  h  value  of  30%  would  be  about  45%— half  again 
as  large,  based  on  climatology.     But  if  the  hour-by-hour  probability 
was  to  increase  as  the  line  of  showers  came  in,   the  45%  should  be 
raised  to  perhaps  60%.     Thus   if  30%  was  a  good  forecast  for  this 
afternoon,   the  40%  was  poor  for  tonight. 
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The  second  problem  seen  in  Table  2  is  that  of  greater  overforecasting 
of  precipitation  during  the  daytime  regardless  of  period  length. 
This  bias  is  not  universal.  Both  of  these  problems  are  more  evident 
in  Table  3  based  on  forecasts  at  one  particular  station  for  a  12 
month  set  of  warm  season  months.  The  6  h  bias  is  severe  in  the 
daytime.  The  last  column  in  the  table  shows  that  the  prominent 
daytime  bias  would  revert  to  a  much  smaller  bias  for  the  12  h 
forecasts  if  day  and  night  forecasts  had  been  combined. 

What  is  the  reason  for  this  daytime  bias?  In  this  case  it  appeared 
to  be  related  to  the  time  of  occurrence  of  "afternoon"  showers. 
Even  though  forecasters  think  of  them  as  occurring  in  the  afternoon 
or  evening,  they  expected  too  many  of  them  to  start  in  the  afternoon 
period,  i.e.,  used  too  high  a  probability,  for  day-valid  forecasts. 
Knowledge  of  the  hourly  frequency  or  precipitation  can  assist  in 
such  forecasts.  In  the  office  from  which  the  Table  3  forecasts 
came,  the  time  of  maximum  precipitation  frequency  was  essentially 
at  the  dividing  line  between  the  afternoon  and  night  periods  (see 
Topi!  1963).  Realization  of  their  bias  and  learning  of  this  cli- 
matological  fact  greatly  reduced  the  bias  in  the  future. 


By  the  way,  some  mignt  say  that  the  dividing  time  between  forecast 
periods  should  not  be  near  a  time  of  maximum  precipitation  frequency, 
but  should  be  adjusted  so  as  to  bracket  that  time.  However,  the 
forecast  periods  should  really  be  proper  for  the  decision  maker,  not 
the  forecaster.  They  are  reasonably  well  suited  to  the  user  now, 
as  mentioned  earlier,  because  weather  dependent  decisions  are  usually 
either  for  the  work  day  or  for  the  evening  play  period. 

c.  Variations  among  Forecasters 

Some  of  the  variation  in  scores  among  forecasters  could  be 
due  to  attitude.  H.  Roberts  (1968)  said  that  "small  stakes  may  not 
provide  the  incentive  for  a  careful  assessment."  The  large  majority 
of  forecasters  are  very  conscientious  and  work  hard  to  get  the  best 
possible  product.  Perhaps  a  comprehensive  verification  with 
individual  forecaster  scores  available  to  forecasters  and  their 
supervisors,  plus  the  comparison  with  MOS  guidance  and  a  comparative 
ranking  of  forecast  offices  provides  additional  incentive  for 
maximum  effort. 

Table  4,  from  real  data  for  a  particular  station,  further  emphasizes 
the  importance  of  forecast  verification  by  subsets.  These  bias 
data  demonstrate  the  large  differences  which  can  exist  among 
forecasters  at  a  station.  Forecaster  5  appears  to  have  no  appreciable 
problem.  Forecaster  1  has  mainly  a  6-h  problem,  but  is  bad. 
Forecaster  2  has  a  prominent  6  h  bias  plus  more  of  a  problem  on 
shifts  producing  forecasts  near  4  am  and  4  pm  (lead  times  of  2,  14, 
and  26  hours)  which  requires  three  12  h  forecasts  instead  of  one  6 
h  and  two  12  h.  Since  the  latter  set  of  forecasts  is  at  update  time 
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Table  3.   One-station  forecast  bias  (%)  for  two  warm  seasons, 
all  periods  for  various  lead  times. 


Lead  time 
(h) 


Night 


Day 


Average 


12  h  forecasts 


6  h  forecasts 


26 

1 

39 

20 

20 

-1 

33 

16 

14 

3 

40 

21 

8 

1 

30 

15 

2 

-2 

39 

18 

17 


91 


54 


Table  4.   Individual  forecaster  bias  for  various  lead  times, 


Lead  Time 
(h) 


Forecaster 
2    3    4 


12  h  forecasts 


26 

13 

28 

24 

12 

5 

20 

21 

6 

22 

16 

-2 

14 

8 

20 

26 

20 

0 

8 

8 

8 

22 

37 

-13 

2 

4 

32 

19 

17 

-15 

6  h  forecasts 


82   56   27   36 


Number  of  forecasts 


876  840  819  792  834 
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and  is  for  essentially  the  same  forecast  periods  as  made  by  the 
previous  forecaster,  whereas  the  others  are  more  original,  this 
could  reflect  less  experience  or  skill. 

Another  bias  of  some  forecasters,  which  was  fairly  common  and 
pronounced  early  in  the  program,  is  that  of  overuse  or  avoidance  of 
particular  probabilities.     If  the  verification  data  show  the  number 
of  forecasts  in  each  probability,  it  is  easy  to  note  irregular 
variations  in  frequency.     Most  frequency  curves  for  three  forecast 
periods  combined  are  bimodal ,  especially  in  the  warm  season,  with  a 
maximum  at  zero  probability  and  another  near  the  climatic  frequency. 
A  major  deviation  from  a  smooth-curve  frequency  distribution 
suggests  bias  if  it  occurs  in  a  sizable  sample  of  forecasts.     The 
most  common  bias  is  the  underuse  of  50%  probability,  although 
overuse  also  exists.     The  avoidance  of  90%  is  another  abnormality. 
The  50%  problem  exists  to  some  extent  even  today  as  a  vestige  of 
the  categorical   concept.     Since  the  best  forecast  on  a  particular 
day  would  be  the  area!   coverage  observed  on  that  day,  as  stated 
earlier  in  the  discussion  of  areal   coverage,  and  there  is  little 
reason  to  believe  any  particular  areal   coverage  has  a  much 
different  frequency  from  a  nearby  value  (other  than  zero  frequency), 
there  is  no  natural   reason  for  other  than  a  smooth  frequency  curve, 
and  50%  probability  is  the  best  forecast  on  some  days,  as  a  result. 
See  D.  Smith  (1977)  for  areal  coverage  frequencies  derived  from 
radar.     These  verify  the  above. 

d.     Skill   vs.  Distance 

Another  problem  is  that  forecaster  skill  may  deteriorate  as 
distance  of  the  forecast  area  from  the  forecaster's  home  station 
increases.     This  problem  is  also  more  serious  for  some  forecasters 
than  for  others.     Table  5     is  based  on  data  collected  by  Sanford 
Miller,  an  NWS  forecaster  now  retired.     He  used  three  years  of 
routine  forecasts  made  by  himself  and  fellow  forecasters  for  three 
12-h  periods  twice  a  day  at  his  office.     A  is  the  home  station,  and 
the  distance  of  the  other  stations  from  the  home  increases  from  left 
to  right  in  the  table  with  maximum  distance  about  550  km. 

Note  that  the  average  score  decreases  as  the  distance  from  home 
increases.     However,  the  precipitation  frequency  also  mostly  decreases 
with  distance  and  may  contribute  to  the  result  (see  Figure  8  and  its 
discussion  below),  but  this  would  not  be  decisive.     Note  also  that 
the  scores  were  nearly  homogeneous  at  the  home  station,  a  result  in 
accord  with  the  study  by  Gregg  (1969),  since  Miller  et  al.  had  long 
experience  at  his  location.     However,  the  scores  are  far  from 
homogeneous  at  the  other  locations,  suggesting  that  Gfcegg's  finding 
applies  only  to  the  home  station.     Note  also  that  the  scores  for 
forecaster  1  have  a  small   range,  while  those  for  forecaster  5  have 
a  large  range,  with  most  of  the  problem  at  stations  D  and  E.     This 
suggests  that  forecaster  5  may  be  deficient  in  knowledge  of  the 
climatology  of  these  locations,  and  further  training  is  required. 
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Table  5.   Individual  forecaster  skill  scores  for  stations  at 
different  distances. 


Station 


Forecaster 


D 


21 

17 

15 

14 

14 

25 

23 

22 

13 

9 

26 

26 

19 

12 

17 

26 

25 

22 

13 

8 

25 

19 

19 

6 

7 

Average 


25 


22 


19 


12 


11 


Precipitation 
frequency 


0.187 


0.172 


0.189 


0.135 


0.126 
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Finure  8. --Skill   score  vs.   precipitation  frequency   (winter) 
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While  Gregg's  result  of  station  homogeneity  may  well  apply  to  the 
home  location  and  experienced  forecasters  who  have  worked  together 
for  some  time,   it  doesn't  necessarily  apply  to  less  trained  persons 
or  persons  less  experienced  at  a  particular  location.     Thus  even 
verification  for  only  the  home  location  can  show  up  major  differences 
among  forecasters.     As  an  example  from  a  particular  station,  the  six 
forecasters  on  station  had  skill   scores  of  18,  8,  27,  29,  18,  and 
-98.     It's  obvious  that  the  score  of  -98  indicates  a  forecaster 
greatly  in  need  of  help.     One  must  be  certain  to  check  that  the 
number  of  forecasts  involved  is  sufficient  to  make  the  score 
representative,  and  that  the  precipitation  frequency  was  not  highly 
abnormal   for  this  one  forecaster.     However,  this  was  not  the  case. 
Instead,  he  had  come  from  a  part  of  the  country  where  there  was  a 
completely  different  climatology,  and  he  had  not  yet  adjusted.     His 
guidance  was  much  better  than  -98,  so  he  was  told  to  follow  his 
guidance  without  change,   until   he  had  adjusted  to  the  climatic 
conditions  at  the  new  station. 

e.  Sample  Size 

For  routine  verification,  we  experimented  with  the  combinations  of 
months  to  be  treated  as  a  group.     We  soon  felt  that  seasonal 
differences  should  be  significant,  mainly  because  of  areal   coverage 
differences.     We  first  used  a  6-month  season,   using  only  one  season 
at  a  time.     This  is  satisfactory  for  whole-station  figures,  but  the 
number  of  forecasts  for  individual    forecasters,  even  those  fore- 
casting irregularly,  was  small.     There  was  not  a  sufficient 
number  of  forecasts  of  the  lesser  used  (high)   probabilities  so 
as  to  adequately  judge  bias. 

While  we  have  continued  into  the  present  with  monthly  verification 
of  12  months  of  data  (two  6-month  seasons;  add  a  month's  data  and 
drop  a  month's  each  time),  we  have  also  made  some  special   combinations 
to  bring  out  specific  features  in  the  data.     Some  of  these  are 
discussed  below.     There  now  exist  over  13  years  of  forecast  and 
guidance  information  for  66  stations,  and  the  potential  of  these  data 
has  not  been  exhausted. 

f.  Seasonal   Effects 

In  Fig.  8,   from  Hughes   (1968d),  skill   score  is  plotted  for 
each  station  in  the  region  against  the  observed  precipitation 
frequency  for  all   forecasts  in  two  3-month  winter  (D-J-F)  seasons. 
The  regression  line  shown  (solid)  is  for  the  points  left  of  the 
dashed  line  and  clearly  shows  a  dependence  of  skill   score  on 
precipitation  frequency,  with  a  correlation  coefficient  of  0.40.     If 
the  points  to  the  right  of  the  dashed  line  were  included,  there 
would  be  little  dependence,  especially  as  determined  by  linear 
regression.     These  points  were  omitted  because  they  are  a  unique 
group  with  a  special   problem.     They  are  all   frequently  in  air  that 
has  lake-effect  precipitation,  as  they  are  all   the  NWS  offices  in 
upper  and  lower  Michigan,  except  Detroit,  plus  Chicago,  Fort  Wayne, 
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and  South  Bend,  around  the  south  end  of  Lake  Michigan,  and  finally 
Duluth  on  the  west  shore  of  Lake  Superior.  The  three  stations 
closest  to  but  to  the  left  of  the  dashed  line  probably  also  have 
some  lake  effect  as  they  are  Detroit,  Milwaukee,  and  Green  Bay. 

The  lake  effect  that  is  apparently  suppressing  scores  is  probably 
the  lake-created  snow  flurries  acting  through  the  trace  problem. 
Many  times  in  the  winter  season  the  lake  effect  produces  small 
amounts  of  precipitation  in  which  the  possibility  of  it  being  a 
trace  instead  of  the  measurable  amount  being  forecast  is  quite  great, 
especially  since  the  precipitation  is  snow  and  therefore  harder  to 
measure  accurately.  The  problem  shows  up  through  the  equation 

P  =  P  -  P+  (8) 

m    n    t  ' 

where  P     is  the  probability  of  measurable  precipitation  being 
forecast,  Pis  the  probability  of  any  precipitation,  and  P.    is  the 

probability  of  only  a  trace  amount.     Thus  even  if  the  forecaster 

feels  strongly  that  there  will   be  lake-effect  snow  in  his  area  (P 

a 

is  large),  but  the  amount  will  be  small,  as  is  common,  P.  will  be 

fairly  large  so  his  forecast  of  P  can  not  be  large.  This  causes 

the  deviations  from  climatology  to  be  smaller,  and  a  poorer  skill 
score  results.  The  problem  is  further  aggravated  because  the  lake 
effect  creates  many  more  days  with  precipitation.  Because  these 
days  are  mostly  with  small  amounts,  the  skill  score  stays  low  for 
them,  and,  on  the  other  hand,  there  are  now  fewer  no-precipitation 
days  on  which  to  catch  up  some  points  in  the  score  by  forecasting 
considerably  lower  than  the  climatic  frequency. 

Similar  studies  with  the  fall  and  spring  seasons  (Hughes  1968c  and 
1969a)  have  shown  that  the  dependence  of  score  on  the  precipitation 
frequency  is  about  the  same  in  these  seasons  as  in  winter.  However, 
the  lake  effect  is  present  in  the  fall  but  not  in  the  spring.  This 
variation  in  the  lake  effect  is  reasonable  when  one  considers  the 
seasonal  variations  in  air-water  temperature  difference,  so  critical 
in  lake-induced  showers.  The  maximum  difference  is  in  the  depths  of 
winter  because  the  lake  temperature  changes  little  after  October  as 
it  approaches  and  then  slightly  passes  the  temperature  of  maximum 
density  (about  4  C),  while  the  air  continues  to  cool  quite  a  bit 
possibly  into  early  February.  Thus  the  lake  effect  is  strongest  in 
winter,  but  is  strong  even  in  the  fall.  Of  course,  in  spring  the 
warmer  air  over  the  cold  water  suppresses  convection.  Incidentally, 
the  S  score  average  of  all  stations  was  a  little  lower  in  the  fall 
than  in  the  winter,  and  a  little  lower  still  in  spring.  To  check 
out  this  lake  effect  on  later  data,  12  months  of  cold  season 
(October-March)  data  ending  March  1978  were  examined.  The  data 
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showed  the  same  character.  From  all  these  data  it  is  clear  that  the 
strongest  lake  effects  on  scores  are  in  the  upper,  peninsula  of 
Michigan  and  around  the  southeast  shore  of  Lake  Michigan. 

This  dependence  of  skill  score  on  precipitation  frequency  was 
hypothesized  by  Hughes  (1968a)  as  a  parabola-type  curve  showing  a 
maximum  score  at  a  precipitation  frequency  of  about  50%  and 
approaching  zero  score  as  probabilities  of  zero  and  100%  are 
approached  (the  skill  is  not  defined  at  precipitation  frequencies 
of  exactly  0%  or  100%).  This  dependence  is  developed  further  by 
Glahn  and  Jorgensen  (1970),  who  showed  the  parabola-type  relationship 
with  the  Brier  score  of  a  number  of  stations  for  one  cold  season 
and  one  warm  season.  They  also  showed  a  lesser  dependence  in  the 
skill  score;  thus  the  skill  score  does  remove  some  effect  of  the 
precipitation  frequency  on  score. 

The  dependence  of  score  on  precipitation  frequency  noted  above  is 
in  contrast  to  that  found  by  Sanders  (1973)  and  by  Bosart  (1975). 
However,  the  reason  for  the  difference  is  probably  that  their 
samples  were  for  precipitation  variations  at  one  location. 
Experience  clearly  shows  that  forecasters  underplay  deviations  from 
normal,  thus  not  showing  the  full  effect  of  variations  in 
precipitation  frequency.  But  when  we  have  different  stations  with 
different  climatologies,  the  scores  generally  do  reflect  this 
difference,  except  as  seen  next. 

Figure  9  (from  Hughes,  1968b)  shows  the  same  type  of  data  for 
summer.  Note  that  there  is  no  appreciable  dependence  of  score  on 
precipitation  frequency.  Note  also  that  the  skill  score  is  much 
lower,  the  lowest  of  the  year,  averaging  about  half  that  of  winter. 
The  reason  for  the  lower  value  is  most  likely  areal  coverage 
differences,  as  discussed  earlier  using  Eq.  (1)  and  (2).  Because 
summer  rain  is  mostly  spotty  in  showers,  the  areal  coverage  is 
smaller  than  in  winter,  thus  the  average  point  probability  tends  to 
be  less,  and  forecast  probability  can't  deviate  from  the  climatic 
frequency,  resulting  in  a  lower  skill  score. 

It  is  questionable  that  the  differences  in  score  in  summer  are  due 
to  variations  in  the  ability  of  the  forecasters  at  different  locations, 
but  the  reasons  are  unknown.  However,  nine  of  the  top  ten  scores 
are  from  Great  Lakes  states.  An  attempt  to  see  if  the  scatter  was 
related  to  differences  in  the  average  areal  coverage  among  stations 
was  not  conclusive  because  of  the  lack  of  adequate  data  for  many 
locations,  but  the  limited  data  suggested  that  no  major  areal 
coverage  differences  exist  in  the  region  in  the  summer.  When  use 
of  electronically  digitized  radar  systems  becomes  widespread,  data 
from  them  can  be  used  to  reasonably  evaluate  the  areal  coverage  for 
many  locations  somewhat  in  the  manner  used  by  D.  Smith  (1977),  and 
thus  prove  or  disprove  this  point. 
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Figure  9. --Skill   score  vs.   precipitation  frequency  (summer) 
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Further  scatter  might  be  eliminated  if  one  were  to  normalize  the 
data  for  the  persistence  of  precipitation,  such  as  done  by  Glahn  and 
Jorgensen  (1970).     This  is  because  when  precipitation  extends  over 
a  long  period,  i.e.,   it  persists,   it  is  fairly  easy  to  forecast 
higher  probabilities,  and  thus  obtain  better  scores,   in  those 
forecast  periods  which  come  after  the  start  of  the  precipitation. 
Another  way  out  of  this  problem  would  be  to  use  a  skill   score  which 
compares  the  forecast  probabilities  to  the  persistence  probability 
rather  than  the  climatological   probability.     This  might  be  better 
for  the  first  period  forecast,  but  less  reasonable  for  other 
periods.     This  could  be  done  for  many  locations,  because  persistence 
probabilities  for  various  period  lengths  and  lead  times  exist  in 
the  work  of  Jorgensen  and  Klein  (1970). 

Figure  8  and  the  above  discussion  show  that  the  skill   score  has  at 
times  some  dependence  on  precipitation  frequency  and  a  lot  of 
dependence  on  the  trace  frequency.     These  factors  should  be  allowed 
for  in  comparing  skill   scores  of  stations  because  they  are  factors 
over  which  the  forecasters  have  no  control.     The  NWS  Central  Region 
now  has  a  program  in  operation  (see  section  12)  which  compares  the 
forecasts  at  its  WSFOs  on  a  3-month  seasonal   basis  after  adjustment 
of  scores  for  precipitation  frequency,  the  frequency  of  small 
precipitation  amounts,  the  persistence  frequency,  and  the  quality  of 
the  MOS  guidance. 

g.     Characteristics  of  Precipitation  Probabilities 

As  a  result  of  points  made  earlier,  some  characteristics  of 
precipitation  probabilities  are: 

1.  If  there  is  to  be  perfect  correspondence  between  the  forecast 
probability  and  the  relative  frequency—perfect  reliability--the 
average  forecast  probability  must  equal   the  relative  frequency 
of  precipitation.     Therefore  probabilities  tend  to  be  lower  in 
drier  climates. 

2.  Probabilities  tend  to  be  lower  the  shorter  the  length  of  the 
forecast  period,  because  the  relative  frequency  is  usually  lower 
and  1 .  above  applies. 

3.  Probabilities  tend  to  be  lower  as  the  chance  of  a  trace  amount 
increases  because  account  must  be  taken  of  the  possibility  that 
the  weather  system  will  yield  only  a  trace. 

4.  Probabilities  tend  to  be  lower  when  the  precipitation  is  by 
spotty  showers  rather  than  a  widespread  rain  shield  because  the 
probability  at  any  given  point  in  an  area  is  directly  related  to 
the  areal   coverage  of  the  precipitation. 
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5.  Probabilities  tend  to  be  lower  when  there  is  less  persistence  of 
precipitation  because  certainty  is  directly  related  to  persistence, 

6.  Probabilities  tend  to  be  less  extreme  and  the  use  of  the  wery 
low  and  especially  the  ^/ery   high  values  is  less  frequent  as  the 
lead  time  to  the  forecast  period  increases,  until  eventually  the 
range  has  shrunk  to  the  single  value  of  the  climatic  frequency. 

9.  Improper  Bias  Reduction 

Many  of  the  problems  noted  here  have  dealt  only  with  bias--a  lack  of 
reliability.  When  these  biases  are  of  an  organized  type,  such  as 
those  discussed,  they  can  be  removed  or  reduced  by  a  vigorous 
verification  program,  resulting  in  a  better  score  and  more  useful 
forecasts.  However,  artificial  adjusting  of  bias  is  bad  because  it 
lowers  the  ultimate  worth  of  the  forecasts.  An  example  of  such 
improper  adjustment  would  be  to  intentionally  downgrade  a  100% 
probability  forecast  to  a  70%  forecast  simply  because  this  would 
improve  the  bias  of  the  70%  category  which  is  known  to  have  too  few 
precipitation  events  in  past  70%  forecasts.  This  is  trying  to  play 
the  system,  but,  as  mentioned  earlier,  it  has  proven  that  the 
system  is  not  playable  and  it  hurts  the  skill  score  to  do  it  in  any 
form?  Also,  as  the  earlier  forecasts  are  gradually  dropped  from 
the  set  being  verified,  the  bias  of  the  70%  forecasts  would  change 
s  i  §n . 

To  properly  correct  the  70%  forecasts,  look  at  nearby  values.  If 
the  60%  and  80%  probabilities  have  about  enough  precipitation  events, 
or  too  many,  the  70%  problem  is  most  likely  random  error  and  it 
should  be  neglected  because  it  will  right  itself  eventually  without 
any  active  adjustment.  However,  if  some  surrounding  probabilities, 
particularly  higher  ones,  are  also  deficient  in  precipitation  events, 
then  over-forecasting  bias  is  likely  and  should  be  adjusted  for  in 
the  future  by  easing  off  the  usage  of  the  high  probabilities. 

Other  systems  have  been  tried  and  also  found  to  hurt  the  skill  score 
and  the  utility  of  the  forecasts.  THERE  ARE  NO  KNOWN  WAYS  TO 


8This  is  easy  to  prove  in  this  case,  as  follows:  Once  a  forecast  is 
made  and  filed,  there  is  no  way  the  forecaster  can  change  its  score. 
Thus  all  past  forecasts  will  have  the  same  score  no  matter  what  the 
forecaster  does  on  the  present  forecast.  Therefore,  since  the  70% 
forecast  gets  a  poorer  Brier  score  when  rain  occurs  than  a  100% 
forecast,  the  score  of  the  whole  set  would  also  be  poorer  if  70%  is 
forecast.  This  reasoning  can  be  used  to  prove  false  any  artificial 
scheme  involving  past  forecasts.  Correctly  "playing  the  numbers 
game"  involves  only  how  to  make  the  forecast  under  consideration  get 
a  better  score. 
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HONESTLY  BEAT  THE  SCORING  SYSTEM 

10.     Improving  on  MOS 

The  MOS  forecasts  should  be  verified  in  the  same  way  as  the 
forecaster's  forecasts.     This  will   then  show  the  bias  and  relative 
quality  of  the  MOS  forecasts.     However,  since  MOS  equations  are 
derived  on  a  sizable  sample  of  data  and  the  derivation  method 
forces  the  probabilities  to  be  essentially  reliable  on  the  dependent 
data,  any  biases  noted  may  be  transitory,   i.e.,  due  to  the 
idiosyncracies  of  the  smaller  sample  being  verified,  and  not 
universal   and  long-lasting.     This  is  especially  so  where  there  are  no 
pronounced  local   effects  in  the  area  to  which  a  particular  set  of 
MOS  equations  apply.     Where  there  could  be  major  differences  in 
local   effects  in  a  MOS  area,  as  in  the  western  U.S.,  biases  noted 
could  be  long-lasting  and  should  be  worth  determining.     Eventually, 
single-station  equations  for  precipitation  probability,  as  now 
exist  for  maximum  temperature,  should  further  improve  MOS,  especially 
in  the  west. 

The  relative  quality  of  the  MOS  forecasts  compared  to  those  of  the 
forecasters  who  use  it  as  guidance  is  important  because  it  can  be 
an  additional   standard  to  judge  a  forecaster's  efforts.     This  in 
effect  is  saying  that  the  amount  of  "improvement"  of  the  forecasters 
on  MOS  could  be  a  universal  measure  of  quality.     The  difficulty 
with  this  is  that  MOS,  because  its  equations  are  for  areas,  not 
points,  may  have  its  most  difficult  time  in  places  where  there  are 
strong  local  effects,  while  the  forecasters  do  the  best  in  such 
places.     This  is  one  reason  why  the  normalization  procedure 
discussed  in  section  12  has  the  MOS  probability  as  a  variable  in  the 
regression  equation.     Nevertheless,  the  relative  quality  of  MOS  is 
important  because  decisions  as  to  whether  or  not  to  continue  to 
modify  MOS  will   be  made  from  such  data. 

From  a  listing  of  several  year's  forecasts  made  by  the  WSFOs  in  the 
Central   Region,  it  was  noted  when  considering  all   three  forecast 
periods  that  in  the  cold  season  25%  to  50%  of  the  time  the  forecasts 
by  the  WSFOs  were  exactly  the  same  as  that  of  MOS.     The  higher 
percentages  were  in  the  drier  portions  of  the  region,  and  the  range 
was  smaller  in  the  warm  season  with  its  smaller  range  of  precipitation 
frequencies   (see  Figs.   8  and  9).     Also,  about  96%  of  the  time  the 
WSFO  forecasts  were  within  30%  of  the  MOS  value.     Perhaps 
surprisingly,  none  of  these  values  changed  significantly  with  lead 
time,  on  the  average. 

Also  examined  was  the  portion  of  the  probability  range  where  the 
forecasters  made  the  most  improvement  on  the  MOS.     This  was  done  by 
taking  many  months  of  forecasts  and  putting  them  in  an  array  of  169 
boxes  dependent  on  both  the  13  MOS  and  the  13  forecasters' 
probabilities.     It  is  obvious  and  well   known  that  the  forecaster's 
improvement  on  MOS  is  greatest  for  the  first  period  of  the  forecast. 
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From  the  arrays  it  was  also  noted  that  the  improvement  on  MOS  in 
the  first  period  was  much  greater  for  forecasts  which  were  higher 
than  MOS.  It  was  the  same  way,  but  to  a  lesser  extent  in  the  second 
period,  with  little  difference  in  the  third  period.  Interestingly, 
in  the  first  and  second  periods,  this  is  in  spite  of  making  fewer 
forecasts  in  the  first  and  second  periods  than  in  the  third  period 
which  were  higher  than  MOS.  This  strongly  suggests  that  the  forecasts 
are  quite  a  bit  more  successful  in  using  the  extra  data  they  have 
for  the  forecast  to  improve  on  the  beginning  of  the  precipitation 
event  as  compared  with  the  ending  of  the  event.  Naturally  this 
success  would  diminish  as  the  lead  time  to  the  forecast  increases. 

One  way  to  maximize  the  forecaster  input  compared  to  MOS  is  to  be 
aware  of  what  the  scoring  system  is  saying  about  what  is  best  to  do 
for  the  current  forecast.  This  is  "playing  the  numbers  game"  the 
right  way,  by  working  to  get  the  best  current  forecast.  This  is 
good  if  scores  improve,  because  score  is  directly  related  to  utility, 
and  it  can  improve  scores.  Let  us  look  at  some  examples,  using  the 
Brier  scores  given  in  Table  6. 

As  said  earlier,  making  a  major  change  in  guidance  yields  a  big 
gain  if  it  is  in  the  right  direction.  This  is  fairly  obvious  and 
is  mostly  so  in  any  scoring  system,  but  the  squaring  part  of  the 
Brier  score  is  what  makes  it  so  advantageous.  Of  course,  if  the 
forecaster  is  wrong,  guidance  makes  the  big  gain  instead  of  the 
forecaster.  The  key  point  here  is  the  fact  that  changes  are  worth 
more  at  some  probabilities  than  at  others,  and  in  one  direction 
more  than  the  other.  Proper  use  of  this  can  improve  scores.  For 
example,  correctly  changing  MOS  of  10%  to  0%  gains  the  forecaster 
only  0.01  units  of  Brier  score,  but  correctly  changing  80%  to  70% 
gains  0.15  units  (.64  -  .49)— 15  times  as  much.  Also,  correctly 
changing  MOS  guidance  of  10%  to  20%  gains  17  times  as  much  (.81  - 
.64)  as  changing  it  correctly  to  0%.  Not  to  take  these  points  into 
account  in  changing  guidance  is  like  playing  craps  in  Vegas  without 
knowing  the  odds  of  various  rolls--you  go  broke.  These  points  are 
probably  the  reason  that  consensus  forecasts  do  so  well.  But  if  a 
stupid  thinker  like  consensus  can  do  well,  the  thinking  forecaster 
should  be  able  to  do  better  if  acting  with  full  knowledge  of  the 
consequences  of  various  actions. 

To  look  at  it  a  different  way  and  show  why  consensus  works  well, 
look  at  this  example.  If  you  change  a  10%  MOS  forecast  to  0%,  you 
gain  .01  if  you  are  right,  and  lose  .19  if  wrong  (1.0  minus  .81). 
Thus  if  you  are  correct  in  such  changes  only  19  times  out  of  20, 
ou  have  gained  nothing  for  your  effort.  You  must  do  better  than 
9  out  of  20.  On  the  other  hand,  if  you  were  to  raise  the  10%  to 
20%,  you  gain  .17  if  right  (.81  -  .64),  and  lose  only  .03  if  wrong 
(.04  -  .01).  Thus  you  need  to  be  right  only  one  time  out  of  six 
such  changes,  a  dramatic  difference.  All  these  points  have  been 
discussed  in  detail  by  Hughes  (1969b)  and  summed  up  as: 
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Table  6.   Half  Brier  Score. 

PROBABILITY  %       NO  "RAIN"  OBSERVED      "RAIN"  OBSERVED 

0  .00  1.00 

2  .0004  .9604 

5  .0025  .9025 

10  .01  .81 

20  .04  .64 

30  .09  .49 

40  .16  .36 

50  .25  .25 

60  .36  .16 

70  .49  .09 

80  .64  .04 

90  .81  .01 

100  1.00  .00 
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1.  All  correct  changes  to  guidance  are  a  gain  to  the  forecaster, 
and  even  small  gains  are  significant  because  the  average  gain  over 
guidance  is  usually  small. 

2.  A  correct  change  in  the  probability  guidance  that  is  initially 
in  the  direction  of  50%  will  net  more  to  the  forecaster  than  a  like 
change  away  from  50%,  with  the  amount  of  the  gain  greater  the  farther 
the  guidance  value  is  from  50%.  Or,  a  small  correct  change  toward 
50%  is  usually  of  more  value  than  a  much  larger  correct  change  away 
from  50%. 

3.  Verifications  show  that  forecasters  make  changes  away  from  50% 
much  more  often  than  toward  50%.  This  may  be  because  of  the 
conservativeness  of  the  guidance  or  the  lack  of  awareness  of  the 
forecaster  of  the  opportunity  for  gain  with  changes  toward  50%. 

4.  THEREFORE,  forecasters  should  be  alert  for  opportunities  to 

make  changes  toward  50%,  and  they  should  be  aware  of  how  conservative 
they  must  be  in  making  changes  away  from  50%  if  they  are  to  make 
such  efforts  of  value. 

These  points  may  seem  to  be  contrary  to  improving  the  resolution  of 
the  forecasts,  but  this  is  not  so.  All  changes  to  M0S  which  are  in 
the  right  direction,  i.e.,  toward  zero  if  it  doesn't  rain  or  toward 
100%  if  it  does,  improve  on  M0S  through  resolution  regardless  of 
whether  the  changes  are  toward  50%  or  away  from  50%.  This; can  be 
proven  for  sets  of  forecasts  as  follows:  Take  a  set  of  perfectly 
reliable  M0S  forecasts  of  30%.  Assume  the  forecaster  makes  changes 
only  toward  50%,  say  only  to  40%.  The  set  being  changed  to  40%  will 
have  a  better  Brier  score  if  the  observed  frequency  of  precipitation 
on  the  set  is  over  35.0%,  i.e.,  more  than  halfway  from  the  M0S  value. 
The  set  of  unchanged  forecasts  will  get  a  better  score  if  the  set 
removed  has  a  frequency  larger  than  30.0%.  Thus  both  sets  can  have 
better  Brier  scores,  and  yet  both  sets  can  have  (and  at  least  one 
set  will  have)  imperfect  reliability.  We  can  therefore  say  that  the 
resolution  has  improved  so  much  that  it  overcomes  the  reliability 
loss.  Clearly,  then,  going  toward  50%  can  improve  resolution. 

A  listing  of  all  the  possible  gains  and  losses  is  given  in  Tables  7 
(a)  and  (b).  These  were  created  by  Wilbur  Wray  of  NSSFC,  who  said 
that  he  had  improved  his  score  by  use  of  the  tables.  It  would  be 
wise  to  have  these  tables  at  the  forecast  desk  for  easy  reference. 

11 .  Trend  in  Scores 

Have  scores  improved  due  to  this  type  verification  and  other 
factors?  Since  there  have  been  no  breakthroughs  in  forecasting,  one 
would  expect  only  small  changes  from  year  to  year,  with  some  ups 
and  downs.  Because  of  this,  it  was  decided  to  take  4-year  average 
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scores  for  the  first  period  forecast  (when  a  12-h  forecast),  and 
compare  the  post-MOS  4-year  result  (1972-75)  with  the  pre-MOS 
4-year  period  immediately  previous.     This  was  done  for  five  WSFOs 
in  the  Central   Region   (Milwaukee,  Chicago,  Louisville,  St.   Louis, 
Denver). 

The  skill   scores  for  each  station  were  obtained  for  each  calendar 
year  and  then  averaged  for  each  4-year  period,  with  the  result 
shown  in  Table  8.     Note  that  four  of  the  five  stations  have  improved 
local   scores  from  the  earlier  period  (aLCL).     The  3-year  score  1973- 
75  gave  even  better  results.     Note  also  that  the  MOS  objective 
guidance  (aGUID)  had  materially  improved  on  its  subjective  predecessor 
at  only  two  locations.     Also,  the  sizable  difference  between  the 
scores  of  MOS  and  the  forecaster  (L-G)  shows  that  MOS  had  a  long 
way  to  go  to  be  competitive  for  the  first  period  forecast. 

These  forecaster  improvements  with  time  are  contrary  to  that  found 
by  others,   for  example  Sanders   (1973)  and  Cook  and  Smith  (1977). 
However,  Sanders  and  Cook  and  Smith  are  correct  for  their  single- 
station  sample  (Station  A  in  Table  8  is  the  one  Cook  and  Smith  used), 
but  generalizing  using  one  location  has  its  dangers.     However,   as 
we  will   see  next,  while  their  conclusions  were  not  general   at  the 
time,  they  may  be  more  general   now,  although  the  five  locations 
used  here  are  still   not  a  large  sample  of  stations  and  are  not  very 
geographically  dispersed. 

Table  9  is  an  update  of  Table  8  prepared  for  the  2-year  period  of 
April   1976  through  March  1978  (the  period  just  after  that  of  Cook 
and  Smith)  by  averaging  the  skill   score  from  the  first  period 
forecasts  of  a  12-month  warm  season  set  and  a  12-month  cold  season 
set  of  forecasts.     Note  that  station  A  (used  by  Cook  and  Smith) 
made  the  most  substantial   gain,  but  its  MOS  guidance  gained  even 
more.     Note  also  in  the  average  gain  that  the  WSFOs  had  about  a 
zero  value,  some  going  up  and  some  down,  while  the  MOS  gained  a  bit 
at  every  location.     Thus  the  forecasters  changed  little,  while  the 
MOS  scores  continued  to  improve,  narrowing  the  margin  between  them. 
MOS  is  certainly  competitive  now,  especially  when  it  is  realized 
that  the  forecasters  have  the  advantage  of  later  data  and  that  their 
gain,   if  any,  would  be  smaller  for  the  second  and  third  periods  of 
the  forecast. 

12.     Forecast  Comparisons 

Panofsky  and  Brier  (1963)   discussed  the  pitfalls  of  verification  and 
indicated  that  one  of  the  greatest  dangers  lies  in  attempts  to 
compare  relative  abilities  when  there  are  climatological   differences. 
This  danger  still   exists  in  probability  forecasts.     The  climatological 
factors  of  importance  in  precipitation  forecast  comparisons  were 
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Table  8.   Average  station  (LCL)  and  guidance  (GUID)  skill 
scores  for  1972-75  and  change  in  score  from 
1968-71,  all  for  first  12  h  period. 


WSFO 


LCL 


GUID 


LCL 
-GUID 


Change  from 
1968-71 


ALCL 


AGUID 


A 

40 

35 

5 

-6 

-2 

B 

42 

32 

10 

2 

5 

C 

43 

30 

13 

4 

1 

D 

46 

34 

12 

9 

9 

E 

34 

22 

12 

7 

-1 

Table  9.   Average  station  (LCL)  and  guidance  (GUID)  skill 
score  from  April  1976-March  1978  and  change  from 
1972-75  period  (first  12  h  period  only) . 


Change 

from 

LCL 

1972- 

-75 

WSFO 

LCL 

GUID 

-GUID 

ALCL 

AGUID 

A 

45.5 

41.5 

4.0 

5.5 

6.5 

B 

38.5 

33.5 

5.0 

-3.5 

1.5 

C 

41.5 

35.0 

6.5 

-1.5 

5.0 

D 

48.0 

40.5 

7.5 

2.0 

6.5 

E 

29.5 

24.5 

5.0 

Average 

-4.5 
-0.4 

2.5 

4.4 
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discussed  earlier  (section  8),  and  earlier  yet  it  was  mentioned 
that  the  climatological  skill  score  removes  some  dependence  on 
precipitation  frequency.  So  if  comparisons  are  made  using  this 
score,  comparing  forecasters  at  the  same  location  is  the  safest; 
but  even  then  one  needs  a  sizable  sample  (2  years  of  regular  shift 
forecasting?)  to  be  able  to  reasonably  neglect  climatic  differences. 
Comparing  offices  is  much  more  difficult,  and  having  a  large  sample 
size  is  by  no  means  adequate.  This  is  obvious  from  the  discussion 
of  the  various  climatic  factors  given  earlier. 

The  safest  way  to  compare  offices  is  to  adjust  for  the  climatic 
differences  in  some  manner.  Hughes  and  Sangster  (1969b)  discuss 
in  detail  a  method  for  doing  this  by  screening  regression  to  select 
and  weigh  the  the  climatic  effects.  The  terms  in  the  equations 
involve  precipitation  frequency,  frequency  of  small  amouhts  of 
precipitation  and  persistence  of  precipitation,  all  in  a  variety 
of  forms  and  involving  long-term  and  short-term  (sample)  values. 
The  score  of  MOS  was  also  included.  However,  even  such  an 
adjustment  is  not  fully  definitive,  as  they  added  a  subjective 
adjustment  to  the  objective  ranking.  It  is  likely  though,  that 
when  a  number  of  offices  are  compared  on  a  sizable  sample,  those  at 
the  top  of  the  list  should  have  something  going  for  them  other 
than  climatology,  especially  compared  to  those  offices  near  the 
bottom  of  the  list.  They  also  found  that  a  comparative  score 
which  is  simply  the  difference  in  the  Brier  scores  was  little 
affected  by  the  climatological  factors  they  used. 

13.  Combining  Probabilities 

There  are  times  when  users  need  a  probability  for  a  period  that  is 
considerably  longer  than  our  usual  6  or  12  h  period  for  which  we 
issue  probabilities  for  some  parameters.  How  to  do  this  with  the 
precipitation  probabilities  routinely  issued  has  been  discussed  in 
depth  by  Hughes  and  Sangster  (1974,  1979a).  The  essence  of  this 
and  its  main  results  are  given  here  for  convenience. 

An  equation  for  the  3-period  probability  from  three  1 -period 
probabilities  is,  according  to  a  basic  addition  formula  for 
probabilities: 

■  Pl  +  P2  +  P3  "  P1P2  "  P1P3  "  P2P3  +  P1P2P3      "    (9> 
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Here  Pj^  and  po  are  the  probabilities  (as  decimal  fractions)  for 

the  three  periods  separately  and  P is  the  probability  for 

123 
the  three  periods  taken  as  a  whole.  If  a  2-period  probability 
combination  is  desired,  the  terms  on  the  right  involving  P~  should 
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be  omitted.  Incidentally,  this  equation  is  appropriate  whether  or 
not  the  three  periods  are  contiguous  or  are  the  same  length.  The 
main  problem  with  using  this  equation  for  the  task  at  hand  is  that 
it  assumes  the  events  are  independent  of  each  other,  an  assumption 
not  likely  to  be  correct  for  precipitation  for  the  three  contiguous 
periods  of  a  weather  forecast.  It  also  assumes  the  probabilities 
reliable. 

To  determine  the  effect  of  these  assumptions,  the  actual  locally- 
made  12  h  forecasts  for  28  stations  distributed  rather  uniformly 
across  the  north-central  U.  S.  for  one  warm  season  (April -September) 
and  one  cold  season  (October-March)  were  treated.  Each  of  the 
seven  terms  on  the  right  side  of  the  above  equation  was  determined 
from  the  actual  forecasts  and  then  fed  into  a  regression  program 
as  predictors,  with  the  observed  precipitation  as  a  predictand. 
The  regression  estimate  of  the  36  h  probability  for  the  warm 
season  is: 

P      =  .03  +  1.04P1  +  .81P2  +  .97P3  -  .86P1P2  -  1.3SPjP2  -  -76P2P3  + 
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+  1. 


19P1P2P3  (10) 


Note  that  the  terms  in  the  equation  have  the  same  sign  as  in  the 
independent  equation  given  earlier  and  that  the  coefficients  are 
quite  close  to  the  1.0  of  the  independent  equation.  The  constant 
is  not  the  zero  value  of  Eq.  (9),  but  .03  (3%),  and  is  probably 
brought  about  by  a  lack  of  reliability  of  the  forecasts  in  the 
yery   low  probabilities  numbers.  However,  it  is  small  enough  that 
it  will  not  have  an  appreciable  effect  on  the  final  result. 

With  this  equation,  one  could  calculate  the  36  h  probability  for 
any  warm  season  set  of  three  12  h  probabilities.  The  cold  season 
result  was  not  as  nice  in  that  its  coefficients  were  much  farther 
from  1.0,  and  the  signs  were  not  always  the  same  as  the  independent 
equation.  The  result  from  these  warm  and  cold  season  equations 
indicates  that  the  warm  season  events  are  nearly  independent,  one 
period  from  another,  while  in  the  cold  season  they  are  much  more 
dependent.  This  is  what  one  would  suspect  from  the  longer  time 
span  of  precipitation  in  the  cold  season. 

To  get  around  this  and  other  problems  with  this  approach,  another 
basic  equation  was  used,  given  as: 


12 


1   '2   '12 


This  is  for  combining  two  periods  only,  but  it  can  be  repeated  any 
number  of  times.  P12  is  the  probability  of  precipitation  occurring 
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in  both  periods  1  and  2.  The  equation  thus  does  not  have  the 
assumption  of  independence  of  the  events.  Using  the  observed 
frequencies  of  precipitation  in  the  same  sets  of  probability 
forecasts  as  before,  a  warm  season  and  a  cold  season  value  of  P 
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was  calculated,  including  an  adjustment  for  the  bias  of  the 
probabilities.  Tables  10  and  11  result  from  the  use  of  the 
equations,  and  tables  12  and  13  show  the  amount  of  deviation  from 
independent  events. 

Once  the  24  h  probability  is  obtained  from  a  table,  the  table  can 
be  used  again  with  that  value,  interpolating  as  necessary,  and  the 
third  probability  to  get  a  probability  for  a  36  h  period,  etc. 
Note  that  the  warm  season  events  are  still  fairly  close  to 
independent,  and  the  cold  season  is  close  except  when  both 
probabilities  are  middle  values.  The  tables  can  be  used  with  those 
forecasts  starting  with  a  6  h  period,  but  since  the  independence 
is  slightly  less  with  a  6  h  period  than  with  a  12  h  period,  the 
final  probability  should  be  a  bit  lower  (maximum  2%)   than  the 
tables  indicate. 

A  quick  estimate  of  the  36  h  probability  can  be  obtained  from  Table 
14  and  some  judgement.  Start  with  the  highest  of  the  three 
probabilities.  This  gives  the  lower  limit  for  the  36  h  probability. 
The  second  column  gives  the  highest  possible  36  h  probability  and 
occurs  when  all  three  probabilities  are  the  same  as  that  in  the 
first  column,  and  are  independent.  In  such  a  case  a  downward 
adjustment  for  dependence  (max  10  percent  in  middle  probabilities 
and  zero  at  the  extremes)  will  give  the  desired  result.  If  the 
three  probabilities  are  not  the  same,  one  can  use  the  value  in  the 
right  hand  column.  This  value  is  obtained  by  adding  half  the  range 
to  the  highest  probability  (first  column)  and  reducing  this  mainly 
for  dependence  of  events  and  with  some  rounding  for  convenience  in 
memory.  These  figures  are  probably  more  appropriate  for  the  cold 
season,  so  the  mid-values  should  be  a  bit  higher  for  warm  season 
showers  because  they  are  more  independent.  Notice  that  for  first 
column  probabilities  of  20-70%,  the  rounded  last-column  values  are 
simply  the  highest  single  probability  plus  10%--easy  to  remember, 

14.  Problems  for  the  Future 

We  can  easily  see  what  to  do  about  bias,  and  that  part  has  been 
pushed  vigorously.  However,  experience  clearly  shows  that  only  a 
small  gain  in  Brier  score  can  result  from  such  changes  (see  Hughes 
1965,  and  Sanders  1963  and  1973).  But  to  consider  small  gains 
insignificant  is  unwise.  For  one  reason,  the  history  of  forecast 
improvement  is  only  of  small  gains.  Secondly,  small  gains  can  be 
economically  significant  to  a  large  population  of  decision  makers. 

Much  of  the  deviation  of  the  skill  score  from  perfection  would  be 
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Table  10.   Combination  probabilities  (percent)  for  warm  season. 
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Table  11.   Combination  probabilities  (percent)  for  cold  season 
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Table  12.   Probabilities  from  Table  10  minus  those  for  independent 
events. 


PR0B2 

0    2    5   10   20   30   40  50  60  70  80  90  100 

0000000000000  0 

2        0-0-0       -0       -0       -0       -0  -0  -0  -0  -0  -0  0 

5        0-0-0       -0       -1       -1        -1  -1  -0  -0  -0  -0  0 

10       0       -0       -0       -1       -1       -1       -1  -1.  -l  -l  -l  -0  0 

P            20        0       -0       -1        -1        -2        -3        -3  -2  -2  -2  -1  -1  0 

*            30        0       -0       -1        -1        -3        -4        -4  -3  -3  -2  -2  -1  0 

B            40       0        -0        -1        -1        -3        -4        -5  -5  -4  -3  -2  -1  0 


1 


50  0       -0       -1        -1        -2        -3        -5        -6        -5        -4        -3        -1  0 

60  0        -0       -0       -1        -2        -3        -4        -5        -6        -5        -3        -2  0 

70  0       -0       -0       -1        -2        -2        -3        -4        -5        -6        -4        -2  0 

80  0        -0       -0       -1        -1        -2        -2        -3        -3        -4        -4        -2  0 

90  0       -0       -0        -0       -1        -1       -1        -1        -2        -2        -2        -3  0 

100  000000000000  0 
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Table  13.   Probabilities  from  Table  11  minus  those  for  independent 
events. 
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Table  14.   36-h  Probability  combination  estimate. 
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(lowest  36-h) 

Highest 
36-h  Prob. 
(independent) 

Range 

Highest 
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eliminated  if  the  score  were  changed  to  that  recommended  here 
(section  7e)  by  using  area!  coverage  instead  of  0  and  1  in  the 
Brier  score,  but  that  is  only  a  cosmetic  change  for  a  psychological 
and  logistic  advantage.  The  remaining  deviation  of  the  score  from 
perfection  would  be  due  to  the  lack  of  knowledge  of  forecasting. 
Increasing  the  basic  knowledge  of  so  many  duty  forecasters  is  a 
wery   difficult  task.  It  involves  getting  a  better  understanding 
of  meteorological  processes,  making  better  use  of  observations, 
especially  of  radar  and  satellite,  and  getting  a  better  understanding 
of  the  strengths  and  weaknesses  of  the  various  forms  of  numerical 
and  statistical  guidance  given  to  the  forecaster. 

Comprehensive  verification  should  continue  so  as  to  help  the  new 
forecaster  reduce  bias  and  to  keep  the  experienced  forecaster  on 
his/her  toes.  But  we  are  now  to  the  point  where,  if  the  forecaster 
is  to  remain  ahead  of  the  inevitably  improving  guidance  material 
and  thus  contribute  to  the  forecast,  the  "state  of  the  art"  must 
improve.  Comparative  verification  and  station  ranking  will  help 
encourage  forecasters  to  a  resurgence  of  effort.  However,  the 
main  way  to  go  is  through  training,  formal  or  otherwise.  The 
forecaster  also  must  have  time  in  the  forecast  routine  to  use  the 
knowledge  gained.  It  appears  that  field  office  automation  will 
help  do  this,  as  well  as  provide  tools  heretofore  unavailable  to 
the  forecaster,  which  could  also  help. 

Since  the  main  value  of  the  human  forecaster  will  be  in  the  first 
part  of  the  forecast,  it  is  also  likely  that  a  return  to  more 
diagnostics  is  in  order.  That  is,  obtaining  a  fuller  understanding 
of  what  exists  at  the  initial  time,  and  why  it  exists  would  lead 
to  better  short  term  forecasts.  The  "why"  is  the  harder  part  and 
it  will  require  more  emphasis  on  the  use  of  initial  charts, 
satellite,  radar,  and  peripheral  data,  and  less  on  progs.  Properly 
answering  the  "why"  will  also  require  considerable  meteorological 
understanding,  and  thus  will  require  well-trained  meteorologists. 
With  the  gradually  improving  guidance,  forecasters  are  approaching 
a  "sink  or  swim"  period,  as  seen  for  precipitation  forecasting  by 
Tables  5  and  6,  and  treading  water  won't  do  much  longer.  Things 
will  have  to  be  done  differently  or  they  may  not  be  done  by  people. 
Automation  of  Field  Operations  and  Services  (AFOS)  should  help  here, 
too,  because  it  will  require  changing  old  habits  and  hopefully, 
the  new  ones  will  be  more  to  the  needs  of  the  present.  Probability 
may  well  play  a  larger  role  in  the  future.  However,  Dexter  (1962) 
showed  how  not  to  do  it  when  he  cluttered  the  forecast  with  so 
many  numbers  that  chaos  resulted.  What  is  needed  is  an  approach 
which  is  gradual  and  perhaps  implicit  (see  Hughes  1978a)  so  as  to 
get  the  information  the  user  needs  for  the  best  decision  making. 

Probabilities  for  the  public  need  to  expand  to  give  probabilities 
of  various  precipitation  amounts.  This  and  other  limitations  of 
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the  present  NWS  probability  program  are  discussed  by  Myers  (1974). 
While  not  in  favor  now,  there  may  well  come  a  time  when  probability 
will  be  used  in  warnings.  There  already  is  a  paper  on  this  by 
Franceschini  (1960)  on  thunderstorm  warnings,  and  Barton  (1956) 
noted  that  we  must  eliminate  loose  terminology  because  a  "warning 
defeats  itself  if  people  are  confused."  What  is  the  first  thing 
you  do  when  you  hear  a  warning  for  your  area?  If  you  are  like  the 
people  interviewed  in  disaster  surveys,  you  take  a  look  outside  to 
see  what  it  looks  like.  In  effect  you  are  trying  to  establish  a 
probability  so  as  to  know  where  the  threat  is  compared  to  your  C/L 
ratio.  Do  you  duck  or  just  remain  cautious  and  watching?  If  the 
event  warned  about  is  too  far  to  be  seen  or  identified  out  the 
window,  like  a  flash  flood  or  hurricane,  people  seek  confirmation 
(probability)  from  other  sources.  The  public  is  thus  ripe  for 
more  information  in  probability  terms.  For  discussion  on  this 
with  regard  to  the  so-called  Boulder  Wind  events,  see  Hughes  (1978b) 
For  yery   recent  thoughts  on  probability  from  several  people,  see 
Pielke  (1977)  and  responses  to  this  by  M.  Smith  (1977),  Murphy 
(1978c),  Hughes  (1978a),  M.  Smith  (1978),  and  Pielke  (1978). 

As  mentioned  near  the  beginning,  Brier  (1944)  said,  "The  decisions 
of  a  rational  man  will  to  a  large  extent  depend  upon  his  estimates 
of  the  probabilities  of  the  different  events  and  the  consequences 
of  them.  When  he  is  convinced  that  the  weather  forecaster's 
estimates  of  these  probabilities  are  better  than  his  own,  he  will 
come  to  him  for  weather  information."  He  also  said  in  closing  that 

" so  far  as  the  scientific  problem  of  weather  forecasting  is 

concerned,  the  forecaster's  duty  ends  with  providing  accurate  and 
unbiased  estimates  of  the  probabilities  of  different  weather 
situations."  Thus  we  must  do  it  well,  and  it  is  likely  that  we 
will  do  it  more  in  the  future. 

15.  Public  Education 

Many  NWS  forecasters  feel  that  the  public  needs  education 
concerning  probability  forecasts,  and  the  need  for  such  education 
has  been  suggested  recently  in  the  meteorological  literature,  e.g., 
Murphy  (1977b).  We  all  know,  and  in  the  same  reference  Murphy 
mentions,  that  there  is  a  lack  of  understanding  of  weather 
forecasts  by  the  public  whether  they  are  probabilistic  or  not. 
There  will  always  be  some  misunderstanding.  The  point  I  want  to 
make  here  is  that  while  such  education  may  seem  necessary,  it  may 
only  be  desirable.  As  long  as  we  put  out  the  type  of  forecasts 
needed  by  the  decision  makers—point  probability  forecasts  —  it  is 
sufficient  that  the  public  know  only  how  to  use  them.  I  also 
contend,  as  mentioned  earlier,  that  the  public  does  know  how  to 
use  them  because  they  have  unknowingly  made  probabilistic  decisions 
in  all  of  their  decisions  to  date. 

It  is  rather  easy  to  know  how  to  use  probability  forecasts,  as  was 
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discussed  to  some  extent  under  the  cost/loss  ratio  concept  (section 
3b)  and  I  am  convinced  that  the  public  does  know  how  to  use  the 
probability  forecasts  because  doing  so  is  so  simple.  If  the  public 
understands  how  to  use  them,  is  any  education  program  worthwhile? 
Yes,  mainly  because  of  the  confidence  in  the  forecasts  that  such 
knowledge  will  help  build,  and  possibly  also  because  some  of  the 
details  of  the  program,  for  example  that  we  are  forecasting  for 
measurable  precipitation,  may  be  of  value  to  some  decision  makers. 

What  should  be  discussed?  The  most  obvious  thing  is  the  quality 
of  the  forecasts  in  the  manner  most  obvious  to  the  user--the 
reliability  of  a  sizable  set  of  forecasts.  This  can  be  done  by 
showing  a  set  of  forecast  probabilities  and  their  observed 
frequencies  for  the  home  location  as  given  in  the  routine 
probability  printout,  or  it  can  be  shown  in  the  usual  reliability 
diagram  (Fig.  7).  Reliability  is  an  important  attribute  of  such 
forecasts  and  knowing  the  high  degree  of  reliability  that  exists 
in  practically  all  sets  of  such  forecasts  should  increase  user 
confidence  in  and  acceptance  of  the  forecasts. 

Discussion  of  how  they  are  made,  in  an  orderly  and  scientific  way, 
as  discussed  earlier  in  this  memo  (section  5c)  and  not  the  ways 
the  cartoons  show,  would  also  increase  confidence  in  the  forecasts. 
Then  there  is  the  trace-measurable  problem,  and  also  that  the 
amount  of  precipitation  tends  to  be  greater  with  the  larger 
probabilities  on  the  average,  as  discussed  earlier  (section  4f). 
Exactly  what  the  forecast  periods  are,  especially  one  like 
"remainder  of  tonight",  is  useful,  as  is  discussion  of  time  changes 
in  skill--lesser  use  of  the  yery   high  and  \/ery   low  probabilities 
in  the  later  forecast  periods.  Finally  there  is  the  areal  coverage 
problem,  discussed  in  depth  earlier  (section  4d  and  6a)  which 
leads  to  why  you  still  are  carrying  a  40%  probability  when  "it  is 
raining  here  now",  and  the  concept  of  the  probability  being  the 
average  point  probability  for  the  local  area  when  only  one  value 
is  given  for  a  forecast  period. 

The  most  difficult  part  of  probability  forecasting  is  obtaining 
adequate  forecaster  understanding.  That  was  mentioned  earlier  too. 
To  do  a  good  job,  the  forecaster  should  know  all  the  answers  to 
any  questions  on  the  above  points.  Off-hand  comments  to  questions 
by  the  public  may  be  incorrect  and  \/ery   hard  or  impossible  to 
correct  later  in  the  minds  of  the  public. 

The  public  wants  probability- type  information,  as  noted  by  Murphy 
and  Winkler  (1974b),  because  they  have  asked  for  such  information 
in  areas  not  now  available.  When  we  find  a  way  to  provide  such 
information,  we  should  do  it  in  the  future.  We  have  already 
started  in  some  of  the  speciality  forecasts--fire  weather  and 
agriculture.  It  is  likely  that  the  expansion  of  probability  into 
other  areas  depends  more  on  our  knowledge  that  it  gives  better 
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service  than  on  the  public's  interest  and  understanding. 
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